Solution for importing more data from MySQL into Hive

Solution for importing more data from MySQL into Hive

Original derivative command:

bin/sqoop import -connect jdbc:mysql://192.168.169.128:3306/yubei -username root -password 123456 -table yl_city_mgr_evt_info --split-by rec_id -m 4 --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite -create-hive-table -delete-target-dir -hive-database default -hive-table yl_city_mgr_evt_info

Cause analysis: There may be separators such as '\n' in the MySQL field. When imported into Hive, 'n' is used as the line break by default, resulting in more records in Hive.

Solution:

When importing data, add the --hive-drop-import-delims option to delete \n, \r, and \01 in the field.

Final derivative command:

bin/sqoop import -connect jdbc:mysql://192.168.169.128:3306/yubei -username root -password 123456 -table yl_city_mgr_evt_info --split-by rec_id -m 4 --hive-drop-import-delims --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite -create-hive-table -delete-target-dir -hive-database default -hive-table yl_city_mgr_evt_info

Refer to the official documentation: https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html

Supplement: Pitfalls encountered when importing MySQL data from Sqoop to Hive

1.sqoop import to HDFS

1.1 Executing the sqoop job will automatically update the last value

# sqoop incremental import script bin/sqoop job --create sqoop_hdfs_test02 --import \
--connect jdbc:mysql://localhost:3306/pactera_test \
--username root \
--password 123456 \
--table student \
--target-dir /user/sqoop/test002/ \
--fields-terminated-by "\t" \
--check-column last_modified \
--incremental lastmodified \
--last-value "2018-12-12 00:03:00" \
--append

Note: The --append parameter is required, otherwise an error will be reported when running the job for the second time, as follows:

At this point, the sqoop job has been built!

2.Hive creates a table and reads the data imported by sqoop

create external table if not exists student_hive (SId int,Sname string ,Sage string,Ssex string , last_modified Timestamp) 
row format delimited fields terminated by '\t' location 'hdfs://node01:8020/user/sqoop/test002/';

Note: The time format in hive is timestamp. If it is set to date, DB data cannot be loaded normally.

The first full load is complete, and the entire route is completely OK, and the hive table can query data.

-----------------------Key dividing line-----------------------

* Incremental loading in sqoop lastmodified format will save the last-value as the system time of job execution. If the check-column of the test database is less than the current system time (that is, the last-value of the previous job), the data will not be loaded.

If SId=6, it will not be loaded, so change to today's time (2018-12-26 17:05) for data testing, and the data will be loaded successfully! Yoho! !

Summarize:

When using the lastmodified format for sqoop incremental import,

1. Note the use of --append;

2. Last-value is the system time when the job runs. When testing data, ensure that the data is accurate and self-increasing.

3. Everything has a fixed number. Check the information and accurately locate the problems encountered by your system

The above is my personal experience. I hope it can give you a reference. I also hope that you will support 123WORDPRESS.COM. If there are any mistakes or incomplete considerations, please feel free to correct me.

You may also be interested in:
  • How to quickly import data into MySQL
  • MySQL 4 methods to import data
  • How to use Navicat to export and import mysql database
  • Java uses MYSQL LOAD DATA LOCAL INFILE to import large amounts of data into MySQL
  • Detailed explanation of MySQL command line export and import database instance
  • Steps to import mysql database under wampserver
  • MySQL command line export and import database
  • Use mysqldump to import data and mysqldump incremental backup (mysqldump usage)
  • Several different ways to import data into MYSQL
  • Modify php.ini to implement the maximum limit of Mysql import database file modification method
  • Steps for importing tens of millions of data into MySQL using .Net Core

<<:  Detailed tutorial on installing nacos in docker and configuring the database

>>:  JavaScript imitates Xiaomi carousel effect

Recommend

A brief analysis of MySQL parallel replication

01 The concept of parallel replication In the mas...

How to use Dockerfile to create a mirror of the Java runtime environment

The current environment is: Centos 7.5 docker-ce ...

How to connect to MySQL visualization tool Navicat

After installing Navicat The following error may ...

Ubuntu MySQL version upgraded to 5.7

A few days ago, the library said that the server ...

Native JS object-oriented typing game

This article shares the specific code of JS objec...

How to implement interception of URI in nginx location

illustrate: Root and alias in location The root d...

Detailed process of NTP server configuration under Linux

Table of contents 1. Environment Configuration 1....

Practical way to build selenium grid distributed environment with docker

Recently, I needed to test the zoom video confere...

VMware virtualization kvm installation and deployment tutorial summary

Virtualization 1. Environment Centos7.3 Disable s...

Detailed explanation of how to use element-plus in Vue3

Table of contents 1. Installation 2. Import in ma...

Web form creation skills

In fact, the three tables above all have three ro...

MySQL 5.7.18 free installation version configuration tutorial

MySQL 5.7.18 free installation version installati...

HTML Tutorial: Definition List

<br />Original text: http://andymao.com/andy...