Solution for importing more data from MySQL into Hive

Solution for importing more data from MySQL into Hive

Original derivative command:

bin/sqoop import -connect jdbc:mysql://192.168.169.128:3306/yubei -username root -password 123456 -table yl_city_mgr_evt_info --split-by rec_id -m 4 --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite -create-hive-table -delete-target-dir -hive-database default -hive-table yl_city_mgr_evt_info

Cause analysis: There may be separators such as '\n' in the MySQL field. When imported into Hive, 'n' is used as the line break by default, resulting in more records in Hive.

Solution:

When importing data, add the --hive-drop-import-delims option to delete \n, \r, and \01 in the field.

Final derivative command:

bin/sqoop import -connect jdbc:mysql://192.168.169.128:3306/yubei -username root -password 123456 -table yl_city_mgr_evt_info --split-by rec_id -m 4 --hive-drop-import-delims --fields-terminated-by "\t" --lines-terminated-by "\n" --hive-import --hive-overwrite -create-hive-table -delete-target-dir -hive-database default -hive-table yl_city_mgr_evt_info

Refer to the official documentation: https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html

Supplement: Pitfalls encountered when importing MySQL data from Sqoop to Hive

1.sqoop import to HDFS

1.1 Executing the sqoop job will automatically update the last value

# sqoop incremental import script bin/sqoop job --create sqoop_hdfs_test02 --import \
--connect jdbc:mysql://localhost:3306/pactera_test \
--username root \
--password 123456 \
--table student \
--target-dir /user/sqoop/test002/ \
--fields-terminated-by "\t" \
--check-column last_modified \
--incremental lastmodified \
--last-value "2018-12-12 00:03:00" \
--append

Note: The --append parameter is required, otherwise an error will be reported when running the job for the second time, as follows:

At this point, the sqoop job has been built!

2.Hive creates a table and reads the data imported by sqoop

create external table if not exists student_hive (SId int,Sname string ,Sage string,Ssex string , last_modified Timestamp) 
row format delimited fields terminated by '\t' location 'hdfs://node01:8020/user/sqoop/test002/';

Note: The time format in hive is timestamp. If it is set to date, DB data cannot be loaded normally.

The first full load is complete, and the entire route is completely OK, and the hive table can query data.

-----------------------Key dividing line-----------------------

* Incremental loading in sqoop lastmodified format will save the last-value as the system time of job execution. If the check-column of the test database is less than the current system time (that is, the last-value of the previous job), the data will not be loaded.

If SId=6, it will not be loaded, so change to today's time (2018-12-26 17:05) for data testing, and the data will be loaded successfully! Yoho! !

Summarize:

When using the lastmodified format for sqoop incremental import,

1. Note the use of --append;

2. Last-value is the system time when the job runs. When testing data, ensure that the data is accurate and self-increasing.

3. Everything has a fixed number. Check the information and accurately locate the problems encountered by your system

The above is my personal experience. I hope it can give you a reference. I also hope that you will support 123WORDPRESS.COM. If there are any mistakes or incomplete considerations, please feel free to correct me.

You may also be interested in:
  • How to quickly import data into MySQL
  • MySQL 4 methods to import data
  • How to use Navicat to export and import mysql database
  • Java uses MYSQL LOAD DATA LOCAL INFILE to import large amounts of data into MySQL
  • Detailed explanation of MySQL command line export and import database instance
  • Steps to import mysql database under wampserver
  • MySQL command line export and import database
  • Use mysqldump to import data and mysqldump incremental backup (mysqldump usage)
  • Several different ways to import data into MYSQL
  • Modify php.ini to implement the maximum limit of Mysql import database file modification method
  • Steps for importing tens of millions of data into MySQL using .Net Core

<<:  Detailed tutorial on installing nacos in docker and configuring the database

>>:  JavaScript imitates Xiaomi carousel effect

Recommend

Solution to Ubuntu cannot connect to the network

Effective solution for Ubuntu in virtual machine ...

XHTML Getting Started Tutorial: Commonly Used XHTML Tags

<br />Just like an article, our web pages sh...

js realizes the dynamic loading of data by waterfall flow bottoming out

This article shares with you the specific code of...

Basic usage of wget command under Linux

Table of contents Preface 1. Download a single fi...

Detailed explanation of Linux index node inode

1. Introduction to inode To understand inode, we ...

Detailed explanation of the role of explain in MySQL

1. MYSQL index Index: A data structure that helps...

Useful codes for web page creation

<br />How can I remove the scroll bar on the...

How to change the encoding of MySQL database to utf8mb4

The utf8mb4 encoding is a superset of the utf8 en...

Make your text dance with the marquee attribute in HTML

Syntax: <marquee> …</marquee> Using th...

Intellij IDEA quick implementation of Docker image deployment method steps

Table of contents 1. Docker enables remote access...

CentOS 8 officially released based on Red Hat Enterprise Linux 8

The CentOS Project, a 100% compatible rebuild of ...

Three ways to achieve text flashing effect in CSS3 Example code

1. Change the transparency to achieve the gradual...

React's method of realizing secondary linkage

This article shares the specific code of React to...

How to permanently change the host name in Linux

If you want to change your host name, you can fol...