Original derivative command: bin/sqoop import -connect jdbc:mysql://192.168.169.128:3306/yubei -username root -password 123456 -table yl_city_mgr_evt_info --split-by rec_id -m 4 --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite -create-hive-table -delete-target-dir -hive-database default -hive-table yl_city_mgr_evt_info Cause analysis: There may be separators such as '\n' in the MySQL field. When imported into Hive, 'n' is used as the line break by default, resulting in more records in Hive. Solution:When importing data, add the --hive-drop-import-delims option to delete \n, \r, and \01 in the field. Final derivative command: bin/sqoop import -connect jdbc:mysql://192.168.169.128:3306/yubei -username root -password 123456 -table yl_city_mgr_evt_info --split-by rec_id -m 4 --hive-drop-import-delims --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite -create-hive-table -delete-target-dir -hive-database default -hive-table yl_city_mgr_evt_info Refer to the official documentation: https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html Supplement: Pitfalls encountered when importing MySQL data from Sqoop to Hive 1.sqoop import to HDFS1.1 Executing the sqoop job will automatically update the last value # sqoop incremental import script bin/sqoop job --create sqoop_hdfs_test02 --import \ --connect jdbc:mysql://localhost:3306/pactera_test \ --username root \ --password 123456 \ --table student \ --target-dir /user/sqoop/test002/ \ --fields-terminated-by "\t" \ --check-column last_modified \ --incremental lastmodified \ --last-value "2018-12-12 00:03:00" \ --append Note: The --append parameter is required, otherwise an error will be reported when running the job for the second time, as follows: At this point, the sqoop job has been built! 2.Hive creates a table and reads the data imported by sqoopcreate external table if not exists student_hive (SId int,Sname string ,Sage string,Ssex string , last_modified Timestamp) row format delimited fields terminated by '\t' location 'hdfs://node01:8020/user/sqoop/test002/'; Note: The time format in hive is timestamp. If it is set to date, DB data cannot be loaded normally. The first full load is complete, and the entire route is completely OK, and the hive table can query data. -----------------------Key dividing line----------------------- * Incremental loading in sqoop lastmodified format will save the last-value as the system time of job execution. If the check-column of the test database is less than the current system time (that is, the last-value of the previous job), the data will not be loaded. If SId=6, it will not be loaded, so change to today's time (2018-12-26 17:05) for data testing, and the data will be loaded successfully! Yoho! ! Summarize:When using the lastmodified format for sqoop incremental import, 1. Note the use of --append; 2. Last-value is the system time when the job runs. When testing data, ensure that the data is accurate and self-increasing. 3. Everything has a fixed number. Check the information and accurately locate the problems encountered by your system The above is my personal experience. I hope it can give you a reference. I also hope that you will support 123WORDPRESS.COM. If there are any mistakes or incomplete considerations, please feel free to correct me. You may also be interested in:
|
<<: Detailed tutorial on installing nacos in docker and configuring the database
>>: JavaScript imitates Xiaomi carousel effect
01 The concept of parallel replication In the mas...
The current environment is: Centos 7.5 docker-ce ...
After installing Navicat The following error may ...
A few days ago, the library said that the server ...
This article shares the specific code of JavaScri...
This article shares the specific code of JS objec...
illustrate: Root and alias in location The root d...
Table of contents 1. Environment Configuration 1....
Recently, I needed to test the zoom video confere...
Virtualization 1. Environment Centos7.3 Disable s...
Table of contents 1. Installation 2. Import in ma...
1. Concurrency Concurrency is the most important ...
In fact, the three tables above all have three ro...
MySQL 5.7.18 free installation version installati...
<br />Original text: http://andymao.com/andy...