Sqoop export map100% reduce0% stuck in various reasons and solutions

Sqoop export map100% reduce0% stuck in various reasons and solutions

I call this kind of bug a typical "Hamlet" bug, which means the kind of bug where "the error message is the same but there are various solutions on the Internet", making us wonder which one is the crux of the problem.

First look at the import command:

[root@host25 ~]# 
sqoop export --connect "jdbc:mysql://172.16.xxx.xxx:3306/dbname?useUnicode=true&characterEncoding=utf-8" 
--username=root --password=xxxxx --table rule_tag --update-key rule_code 
--update-mode allowinsert 
--export-dir /user/hive/warehouse/lmj_test.db/rule_tag --input-fields-terminated-by '\t' 
--input-null-string '\\N' --input-null-non-string '\\N' -m1

This import command is syntactically correct.

Next is the error:

#Extract part 19/06/11 09:39:57 INFO mapreduce.Job: The url to track the job: http://dthost25:8088/proxy/application_1554176896418_0537/
19/06/11 09:39:57 INFO mapreduce.Job: Running job: job_1554176896418_0537
19/06/11 09:40:05 INFO mapreduce.Job: Job job_1554176896418_0537 running in uber mode : false
19/06/11 09:40:05 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:40:19 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 09:45:34 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_0, Status : FAILED
AttemptID:attempt_1554176896418_0537_m_000000_0 Timed out after 300 secs
19/06/11 09:45:36 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:45:48 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 09:51:04 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_1, Status : FAILED
AttemptID:attempt_1554176896418_0537_m_000000_1 Timed out after 300 secs
19/06/11 09:51:05 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:51:17 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 09:56:34 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_2, Status : FAILED
AttemptID:attempt_1554176896418_0537_m_000000_2 Timed out after 300 secs
19/06/11 09:56:35 INFO mapreduce.Job: map 0% reduce 0%
19/06/11 09:56:48 INFO mapreduce.Job: map 100% reduce 0%
19/06/11 10:02:05 INFO mapreduce.Job: Job job_1554176896418_0537 failed with state FAILED due to: Task failed task_1554176896418_0537_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
19/06/11 10:02:05 INFO mapreduce.Job: Counters: 9
 Job Counters 
 Failed map tasks=4
 Launched map tasks=4
 Other local map tasks=3
 Data-local map tasks=1
 Total time spent by all maps in occupied slots (ms)=2624852
 Total time spent by all reduces in occupied slots (ms)=0
 Total time spent by all map tasks (ms)=1312426
 Total vcore-seconds taken by all map tasks=1312426
 Total megabyte-seconds taken by all map tasks=2687848448
19/06/11 10:02:05 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
19/06/11 10:02:05 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 1,333.3153 seconds (0 bytes/sec)
19/06/11 10:02:05 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
19/06/11 10:02:05 INFO mapreduce.ExportJobBase: Exported 0 records.
19/06/11 10:02:05 ERROR tool.ExportTool: Error during export: Export job failed!
Time taken: 1340 s 
Task IDE_TASK_ADE56470-B5A3-4303-EA75-44312FF8AA0C_20190611093945147 is complete.

It can be seen that the import task stopped at INFO mapreduce.Job: map 100% reduce 0% and stopped for 5 minutes. Then the task was automatically restarted and stuck for another 5 minutes. Finally, the task reported a timeout error.

Obviously, the direct cause of the task failure is timeout, but the reason for the timeout is that the MapReduce task of the import process is stuck. Why does MapReduce get stuck? This is not mentioned in the error log, which is the most troublesome part when checking the cause.

Let me tell you the result first. After a long search, I found that it was because the data length of one row exceeded the field length set by MySQL. That is, when importing the string "The string is very long, very long, very long, very long, very long, very long, very long" into the varchar(50) field, the task is blocked.

Here I will summarize the various reasons on the Internet. You can check them one by one.

Possible reasons for getting stuck at map 100% reduce 0%: (taking MySQL export as an example)

1. Length overflow. The imported data exceeds the field length set in the MySQL table

Solution: Reset the field length

2. Coding error. The imported data is not in the MySQL encoding character set

Solution: In fact, the encoding corresponding to the UTF-8 character set in the MySQL database is not utf8, but utf8mb4. Therefore, when your imported data contains Emoji expressions or some uncommon Chinese characters, it will not be imported and will be blocked. So you need to pay attention to two points:

(1) UseUnicode=true&characterEncoding=utf-8 is specified in the import statement, indicating that the export is in UTF-8 format;

(2) In the MySQL table creation statement, ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 is set;

3. Insufficient memory. The amount of data to be imported may be too large, or the allocated memory may be too small.

Solution: Either import in batches or allocate more memory to the task

4. The host name is incorrect.

Solution: This seems to be a problem with the host name configuration

5. The primary key is repeated.

Solution: This is because there are duplicate primary key values ​​in the data you imported. You need to process the data accordingly.

Supplement: Solution to MapReduce stuck when Sqoop is exporting data from database to HDFS

When exporting data from the database during sqoop, mapreduce gets stuck

After searching on Baidu, it seems that I need to set the configuration items about memory and virtual memory in yarn. I didn't configure these items before, and it worked fine. But this time it seems to be running on a larger scale. This failure may occur because the memory and CPU resources allocated to each Docker are too small to meet the default resource requirements for running Hadoop and Hive.

The solution is as follows:

Add the following configuration to yarn-site.xml:

<property> 
 <name>yarn.nodemanager.resource.memory-mb</name> 
 <value>20480</value> 
</property> 
<property> 
 <name>yarn.scheduler.minimum-allocation-mb</name> 
 <value>2048</value> 
</property> 
<property> 
 <name>yarn.nodemanager.vmem-pmem-ratio</name> 
 <value>2.1</value> 
</property> 

Just shut down yarn and restart it! ! !

The above is my personal experience. I hope it can give you a reference. I also hope that you will support 123WORDPRESS.COM. If there are any mistakes or incomplete considerations, please feel free to correct me.

You may also be interested in:
  • Tutorial on installing and configuring Sqoop for MySQL in a Hadoop cluster environment
  • Solve the problem of sqoop pulling data from postgresql and reporting TCP/IP connection errors
  • Implementation of sqoop reading postgresql database table and importing it into hdfs
  • Solve the problem of increasing data volume after sqoop import into hive
  • Sqoop implements importing postgresql table into hive table
  • How to use shell scripts to execute hive and sqoop commands
  • Detailed tutorial on installation and use of Sqoop

<<:  How to install redis in docker and set password and connect

>>:  Vue3 (Part 2) Integrating Ant Design Vue

Recommend

js to achieve a simple lottery function

This article shares the specific code of js to im...

Example of using Vue built-in component keep-alive

Table of contents 1. Usage of keep-alive Example ...

Detailed explanation of the implementation of shared modules in Angular projects

Table of contents 1. Shared CommonModule 2. Share...

How to quickly modify the root password under CentOS8

Start the centos8 virtual machine and press the u...

How to enable remote access permissions in MYSQL

1. Log in to MySQL database mysql -u root -p View...

React Fragment Introduction and Detailed Usage

Table of contents Preface Motivation for Fragment...

Detailed explanation of loop usage in javascript examples

I was bored and sorted out some simple exercises ...

Optimize the storage efficiency of BLOB and TEXT columns in InnoDB tables

First, let's introduce a few key points about...

A simple example of using Vue3 routing VueRouter4

routing vue-router4 keeps most of the API unchang...

Script to quickly list all host names (computer names) in the LAN under Linux

Recently, I have a need to list all host names in...

Graphical introduction to the difference between := and = in MySQL

The difference between := and = = Only when setti...

MySQL 5.7.18 Installer installation download graphic tutorial

This article records the detailed installation tu...

MySQL partitioning practice through Navicat

MySQL partitioning is helpful for managing very l...

js canvas realizes rounded corners picture

This article shares the specific code of js canva...