Sqoop export map100% reduce0% stuck in various reasons and solutions

I call this kind of bug a typical "Hamlet" bug, which means the kind of bug where "the error message is the same but there are various solutions on the Internet", making us wonder which one is the crux of the problem.

First look at the import command:

[root@host25 ~]# 
sqoop export --connect "jdbc:mysql://172.16.xxx.xxx:3306/dbname?useUnicode=true&characterEncoding=utf-8" 
--username=root --password=xxxxx --table rule_tag --update-key rule_code 
--update-mode allowinsert 
--export-dir /user/hive/warehouse/lmj_test.db/rule_tag --input-fields-terminated-by '\t' 
--input-null-string '\\N' --input-null-non-string '\\N' -m1

This import command is syntactically correct.

Next is the error:

#Extract part 19/06/11 09:39:57 INFO mapreduce.Job: The url to track the job: http://dthost25:8088/proxy/application_1554176896418_0537/
19/06/11 09:39:57 INFO mapreduce.Job: Running job: job_1554176896418_0537
19/06/11 09:40:05 INFO mapreduce.Job: Job job_1554176896418_0537 running in uber mode : false
19/06/11 09:40:05 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:40:19 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 09:45:34 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_0, Status : FAILED
AttemptID:attempt_1554176896418_0537_m_000000_0 Timed out after 300 secs
19/06/11 09:45:36 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:45:48 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 09:51:04 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_1, Status : FAILED
AttemptID:attempt_1554176896418_0537_m_000000_1 Timed out after 300 secs
19/06/11 09:51:05 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:51:17 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 09:56:34 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_2, Status : FAILED
AttemptID:attempt_1554176896418_0537_m_000000_2 Timed out after 300 secs
19/06/11 09:56:35 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:56:48 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 10:02:05 INFO mapreduce.Job: Job job_1554176896418_0537 failed with state FAILED due to: Task failed task_1554176896418_0537_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
19/06/11 10:02:05 INFO mapreduce.Job: Counters: 9
 Job Counters 
 Failed map tasks=4
 Launched map tasks=4
 Other local map tasks=3
 Data-local map tasks=1
 Total time spent by all maps in occupied slots (ms)=2624852
 Total time spent by all reduces in occupied slots (ms)=0
 Total time spent by all map tasks (ms)=1312426
 Total vcore-seconds taken by all map tasks=1312426
 Total megabyte-seconds taken by all map tasks=2687848448
19/06/11 10:02:05 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
19/06/11 10:02:05 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 1,333.3153 seconds (0 bytes/sec)
19/06/11 10:02:05 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
19/06/11 10:02:05 INFO mapreduce.ExportJobBase: Exported 0 records.
19/06/11 10:02:05 ERROR tool.ExportTool: Error during export: Export job failed!
Time taken: 1340 s 
Task IDE_TASK_ADE56470-B5A3-4303-EA75-44312FF8AA0C_20190611093945147 is complete.

It can be seen that the import task stopped at INFO mapreduce.Job: map 100% reduce 0% and stopped for 5 minutes. Then the task was automatically restarted and stuck for another 5 minutes. Finally, the task reported a timeout error.

Obviously, the direct cause of the task failure is timeout, but the reason for the timeout is that the MapReduce task of the import process is stuck. Why does MapReduce get stuck? This is not mentioned in the error log, which is the most troublesome part when checking the cause.

Let me tell you the result first. After a long search, I found that it was because the data length of one row exceeded the field length set by MySQL. That is, when importing the string "The string is very long, very long, very long, very long, very long, very long, very long" into the varchar(50) field, the task is blocked.

Here I will summarize the various reasons on the Internet. You can check them one by one.

Possible reasons for getting stuck at map 100% reduce 0%: (taking MySQL export as an example)

1. Length overflow. The imported data exceeds the field length set in the MySQL table

Solution: Reset the field length

2. Coding error. The imported data is not in the MySQL encoding character set

Solution: In fact, the encoding corresponding to the UTF-8 character set in the MySQL database is not utf8, but utf8mb4. Therefore, when your imported data contains Emoji expressions or some uncommon Chinese characters, it will not be imported and will be blocked. So you need to pay attention to two points:

(1) UseUnicode=true&characterEncoding=utf-8 is specified in the import statement, indicating that the export is in UTF-8 format;

(2) In the MySQL table creation statement, ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 is set;

3. Insufficient memory. The amount of data to be imported may be too large, or the allocated memory may be too small.

Solution: Either import in batches or allocate more memory to the task

4. The host name is incorrect.

Solution: This seems to be a problem with the host name configuration

5. The primary key is repeated.

Solution: This is because there are duplicate primary key values in the data you imported. You need to process the data accordingly.

Supplement: Solution to MapReduce stuck when Sqoop is exporting data from database to HDFS

When exporting data from the database during sqoop, mapreduce gets stuck

After searching on Baidu, it seems that I need to set the configuration items about memory and virtual memory in yarn. I didn't configure these items before, and it worked fine. But this time it seems to be running on a larger scale. This failure may occur because the memory and CPU resources allocated to each Docker are too small to meet the default resource requirements for running Hadoop and Hive.

The solution is as follows:

Add the following configuration to yarn-site.xml:

<property> 
 <name>yarn.nodemanager.resource.memory-mb</name> 
 <value>20480</value> 
</property> 
<property> 
 <name>yarn.scheduler.minimum-allocation-mb</name> 
 <value>2048</value> 
</property> 
<property> 
 <name>yarn.nodemanager.vmem-pmem-ratio</name> 
 <value>2.1</value> 
</property>

Just shut down yarn and restart it! ! !

The above is my personal experience. I hope it can give you a reference. I also hope that you will support 123WORDPRESS.COM. If there are any mistakes or incomplete considerations, please feel free to correct me.

You may also be interested in:

Tutorial on installing and configuring Sqoop for MySQL in a Hadoop cluster environment
Solve the problem of sqoop pulling data from postgresql and reporting TCP/IP connection errors
Implementation of sqoop reading postgresql database table and importing it into hdfs
Solve the problem of increasing data volume after sqoop import into hive
Sqoop implements importing postgresql table into hive table
How to use shell scripts to execute hive and sqoop commands
Detailed tutorial on installation and use of Sqoop

<<: How to install redis in docker and set password and connect

>>: Vue3 (Part 2) Integrating Ant Design Vue

How to add rounded borders to div elements

Sqoop export map100% reduce0% stuck in various reasons and solutions

Possible reasons for getting stuck at map 100% reduce 0%: (taking MySQL export as an example)

The solution is as follows:

How to add rounded borders to div elements

Pure JavaScript to implement the number guessing game

Vue commonly used high-order functions and comprehensive examples

Detailed explanation of the use of find_in_set() function in MySQL

Element Timeline implementation

Summary of the most commonly used knowledge points about ES6 new features

Detailed explanation of explain type in MySQL

Detailed Example of CSS3 box-shadow Property

Detailed explanation of how to install mysql5.7.16 from source code in centos7 environment

Difference and implementation of JavaScript anti-shake and throttling

Recommend

Detailed explanation of the use of title tags and paragraph tags in XHTML

Problems and solutions encountered when using v-model to two-way bind the values of parent-child components in Vue

How to create a child process in nodejs

How to implement animation transition effect on the front end

Implementation of Jenkins+Docker continuous integration

An example of the calculation function calc in CSS in website layout

Summary of some practical little magic in Vue practice

Detailed explanation of object literals in JS

Windows Server 2016 Standard Key activation key serial number

jQuery achieves full screen scrolling effect

HTML page adaptive width table

Mysql 5.6.37 winx64 installation dual version mysql notes

Using docker command does not require sudo

Solution for Vue routing this.route.push jump page not refreshing

How to use the Linux basename command