Sqoop export map100% reduce0% stuck in various reasons and solutions

Sqoop export map100% reduce0% stuck in various reasons and solutions

I call this kind of bug a typical "Hamlet" bug, which means the kind of bug where "the error message is the same but there are various solutions on the Internet", making us wonder which one is the crux of the problem.

First look at the import command:

[root@host25 ~]# 
sqoop export --connect "jdbc:mysql://172.16.xxx.xxx:3306/dbname?useUnicode=true&characterEncoding=utf-8" 
--username=root --password=xxxxx --table rule_tag --update-key rule_code 
--update-mode allowinsert 
--export-dir /user/hive/warehouse/lmj_test.db/rule_tag --input-fields-terminated-by '\t' 
--input-null-string '\\N' --input-null-non-string '\\N' -m1

This import command is syntactically correct.

Next is the error:

#Extract part 19/06/11 09:39:57 INFO mapreduce.Job: The url to track the job: http://dthost25:8088/proxy/application_1554176896418_0537/
19/06/11 09:39:57 INFO mapreduce.Job: Running job: job_1554176896418_0537
19/06/11 09:40:05 INFO mapreduce.Job: Job job_1554176896418_0537 running in uber mode : false
19/06/11 09:40:05 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:40:19 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 09:45:34 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_0, Status : FAILED
AttemptID:attempt_1554176896418_0537_m_000000_0 Timed out after 300 secs
19/06/11 09:45:36 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:45:48 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 09:51:04 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_1, Status : FAILED
AttemptID:attempt_1554176896418_0537_m_000000_1 Timed out after 300 secs
19/06/11 09:51:05 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:51:17 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 09:56:34 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_2, Status : FAILED
AttemptID:attempt_1554176896418_0537_m_000000_2 Timed out after 300 secs
19/06/11 09:56:35 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:56:48 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 10:02:05 INFO mapreduce.Job: Job job_1554176896418_0537 failed with state FAILED due to: Task failed task_1554176896418_0537_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
19/06/11 10:02:05 INFO mapreduce.Job: Counters: 9
 Job Counters 
 Failed map tasks=4
 Launched map tasks=4
 Other local map tasks=3
 Data-local map tasks=1
 Total time spent by all maps in occupied slots (ms)=2624852
 Total time spent by all reduces in occupied slots (ms)=0
 Total time spent by all map tasks (ms)=1312426
 Total vcore-seconds taken by all map tasks=1312426
 Total megabyte-seconds taken by all map tasks=2687848448
19/06/11 10:02:05 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
19/06/11 10:02:05 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 1,333.3153 seconds (0 bytes/sec)
19/06/11 10:02:05 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
19/06/11 10:02:05 INFO mapreduce.ExportJobBase: Exported 0 records.
19/06/11 10:02:05 ERROR tool.ExportTool: Error during export: Export job failed!
Time taken: 1340 s 
Task IDE_TASK_ADE56470-B5A3-4303-EA75-44312FF8AA0C_20190611093945147 is complete.

It can be seen that the import task stopped at INFO mapreduce.Job: map 100% reduce 0% and stopped for 5 minutes. Then the task was automatically restarted and stuck for another 5 minutes. Finally, the task reported a timeout error.

Obviously, the direct cause of the task failure is timeout, but the reason for the timeout is that the MapReduce task of the import process is stuck. Why does MapReduce get stuck? This is not mentioned in the error log, which is the most troublesome part when checking the cause.

Let me tell you the result first. After a long search, I found that it was because the data length of one row exceeded the field length set by MySQL. That is, when importing the string "The string is very long, very long, very long, very long, very long, very long, very long" into the varchar(50) field, the task is blocked.

Here I will summarize the various reasons on the Internet. You can check them one by one.

Possible reasons for getting stuck at map 100% reduce 0%: (taking MySQL export as an example)

1. Length overflow. The imported data exceeds the field length set in the MySQL table

Solution: Reset the field length

2. Coding error. The imported data is not in the MySQL encoding character set

Solution: In fact, the encoding corresponding to the UTF-8 character set in the MySQL database is not utf8, but utf8mb4. Therefore, when your imported data contains Emoji expressions or some uncommon Chinese characters, it will not be imported and will be blocked. So you need to pay attention to two points:

(1) UseUnicode=true&characterEncoding=utf-8 is specified in the import statement, indicating that the export is in UTF-8 format;

(2) In the MySQL table creation statement, ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 is set;

3. Insufficient memory. The amount of data to be imported may be too large, or the allocated memory may be too small.

Solution: Either import in batches or allocate more memory to the task

4. The host name is incorrect.

Solution: This seems to be a problem with the host name configuration

5. The primary key is repeated.

Solution: This is because there are duplicate primary key values ​​in the data you imported. You need to process the data accordingly.

Supplement: Solution to MapReduce stuck when Sqoop is exporting data from database to HDFS

When exporting data from the database during sqoop, mapreduce gets stuck

After searching on Baidu, it seems that I need to set the configuration items about memory and virtual memory in yarn. I didn't configure these items before, and it worked fine. But this time it seems to be running on a larger scale. This failure may occur because the memory and CPU resources allocated to each Docker are too small to meet the default resource requirements for running Hadoop and Hive.

The solution is as follows:

Add the following configuration to yarn-site.xml:

<property> 
 <name>yarn.nodemanager.resource.memory-mb</name> 
 <value>20480</value> 
</property> 
<property> 
 <name>yarn.scheduler.minimum-allocation-mb</name> 
 <value>2048</value> 
</property> 
<property> 
 <name>yarn.nodemanager.vmem-pmem-ratio</name> 
 <value>2.1</value> 
</property> 

Just shut down yarn and restart it! ! !

The above is my personal experience. I hope it can give you a reference. I also hope that you will support 123WORDPRESS.COM. If there are any mistakes or incomplete considerations, please feel free to correct me.

You may also be interested in:
  • Tutorial on installing and configuring Sqoop for MySQL in a Hadoop cluster environment
  • Solve the problem of sqoop pulling data from postgresql and reporting TCP/IP connection errors
  • Implementation of sqoop reading postgresql database table and importing it into hdfs
  • Solve the problem of increasing data volume after sqoop import into hive
  • Sqoop implements importing postgresql table into hive table
  • How to use shell scripts to execute hive and sqoop commands
  • Detailed tutorial on installation and use of Sqoop

<<:  How to install redis in docker and set password and connect

>>:  Vue3 (Part 2) Integrating Ant Design Vue

Recommend

Detailed explanation of the use of title tags and paragraph tags in XHTML

XHTML Headings Overview When we write Word docume...

How to create a child process in nodejs

Table of contents Introduction Child Process Crea...

How to implement animation transition effect on the front end

Table of contents Introduction Traditional transi...

Implementation of Jenkins+Docker continuous integration

Table of contents 1. Introduction to Jenkins 2. I...

An example of the calculation function calc in CSS in website layout

calc is a function in CSS that is used to calcula...

Summary of some practical little magic in Vue practice

How can you forget lazy loading of routes that al...

Detailed explanation of object literals in JS

Table of contents Preface 1. Set the prototype on...

Windows Server 2016 Standard Key activation key serial number

I would like to share the Windows Server 2016 act...

jQuery achieves full screen scrolling effect

This article example shares the specific code of ...

HTML page adaptive width table

In the pages of WEB applications, tables are ofte...

Mysql 5.6.37 winx64 installation dual version mysql notes

If MySQL version 5.0 already exists on the machin...

Using docker command does not require sudo

Because the docker daemon needs to bind to the ho...

Solution for Vue routing this.route.push jump page not refreshing

Vue routing this.route.push jump page does not re...

How to use the Linux basename command

01. Command Overview basename - strip directories...