Original derivative command: bin/sqoop import -connect jdbc:mysql://192.168.169.128:3306/yubei -username root -password 123456 -table yl_city_mgr_evt_info --split-by rec_id -m 4 --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite -create-hive-table -delete-target-dir -hive-database default -hive-table yl_city_mgr_evt_info Cause analysis: There may be separators such as '\n' in the MySQL field. When imported into Hive, 'n' is used as the line break by default, resulting in more records in Hive. Solution:When importing data, add the --hive-drop-import-delims option to delete \n, \r, and \01 in the field. Final derivative command: bin/sqoop import -connect jdbc:mysql://192.168.169.128:3306/yubei -username root -password 123456 -table yl_city_mgr_evt_info --split-by rec_id -m 4 --hive-drop-import-delims --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite -create-hive-table -delete-target-dir -hive-database default -hive-table yl_city_mgr_evt_info Refer to the official documentation: https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html Supplement: Pitfalls encountered when importing MySQL data from Sqoop to Hive 1.sqoop import to HDFS1.1 Executing the sqoop job will automatically update the last value # sqoop incremental import script bin/sqoop job --create sqoop_hdfs_test02 --import \ --connect jdbc:mysql://localhost:3306/pactera_test \ --username root \ --password 123456 \ --table student \ --target-dir /user/sqoop/test002/ \ --fields-terminated-by "\t" \ --check-column last_modified \ --incremental lastmodified \ --last-value "2018-12-12 00:03:00" \ --append Note: The --append parameter is required, otherwise an error will be reported when running the job for the second time, as follows: At this point, the sqoop job has been built! 2.Hive creates a table and reads the data imported by sqoopcreate external table if not exists student_hive (SId int,Sname string ,Sage string,Ssex string , last_modified Timestamp) row format delimited fields terminated by '\t' location 'hdfs://node01:8020/user/sqoop/test002/'; Note: The time format in hive is timestamp. If it is set to date, DB data cannot be loaded normally. The first full load is complete, and the entire route is completely OK, and the hive table can query data. -----------------------Key dividing line----------------------- * Incremental loading in sqoop lastmodified format will save the last-value as the system time of job execution. If the check-column of the test database is less than the current system time (that is, the last-value of the previous job), the data will not be loaded. If SId=6, it will not be loaded, so change to today's time (2018-12-26 17:05) for data testing, and the data will be loaded successfully! Yoho! ! Summarize:When using the lastmodified format for sqoop incremental import, 1. Note the use of --append; 2. Last-value is the system time when the job runs. When testing data, ensure that the data is accurate and self-increasing. 3. Everything has a fixed number. Check the information and accurately locate the problems encountered by your system The above is my personal experience. I hope it can give you a reference. I also hope that you will support 123WORDPRESS.COM. If there are any mistakes or incomplete considerations, please feel free to correct me. You may also be interested in:
|
<<: Detailed tutorial on installing nacos in docker and configuring the database
>>: JavaScript imitates Xiaomi carousel effect
Effective solution for Ubuntu in virtual machine ...
<br />Just like an article, our web pages sh...
This article shares with you the specific code of...
Table of contents Preface 1. Download a single fi...
1. Introduction to inode To understand inode, we ...
1. MYSQL index Index: A data structure that helps...
<br />How can I remove the scroll bar on the...
The utf8mb4 encoding is a superset of the utf8 en...
Install crontab yum install crontabs CentOS 7 com...
Syntax: <marquee> …</marquee> Using th...
Table of contents 1. Docker enables remote access...
The CentOS Project, a 100% compatible rebuild of ...
1. Change the transparency to achieve the gradual...
This article shares the specific code of React to...
If you want to change your host name, you can fol...