I call this kind of bug a typical "Hamlet" bug, which means the kind of bug where "the error message is the same but there are various solutions on the Internet", making us wonder which one is the crux of the problem. First look at the import command: [root@host25 ~]# sqoop export --connect "jdbc:mysql://172.16.xxx.xxx:3306/dbname?useUnicode=true&characterEncoding=utf-8" --username=root --password=xxxxx --table rule_tag --update-key rule_code --update-mode allowinsert --export-dir /user/hive/warehouse/lmj_test.db/rule_tag --input-fields-terminated-by '\t' --input-null-string '\\N' --input-null-non-string '\\N' -m1 This import command is syntactically correct. Next is the error: #Extract part 19/06/11 09:39:57 INFO mapreduce.Job: The url to track the job: http://dthost25:8088/proxy/application_1554176896418_0537/ 19/06/11 09:39:57 INFO mapreduce.Job: Running job: job_1554176896418_0537 19/06/11 09:40:05 INFO mapreduce.Job: Job job_1554176896418_0537 running in uber mode : false 19/06/11 09:40:05 INFO mapreduce.Job: map 0% reduce 0% 19/06/11 09:40:19 INFO mapreduce.Job: map 100% reduce 0% 19/06/11 09:45:34 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_0, Status : FAILED AttemptID:attempt_1554176896418_0537_m_000000_0 Timed out after 300 secs 19/06/11 09:45:36 INFO mapreduce.Job: map 0% reduce 0% 19/06/11 09:45:48 INFO mapreduce.Job: map 100% reduce 0% 19/06/11 09:51:04 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_1, Status : FAILED AttemptID:attempt_1554176896418_0537_m_000000_1 Timed out after 300 secs 19/06/11 09:51:05 INFO mapreduce.Job: map 0% reduce 0% 19/06/11 09:51:17 INFO mapreduce.Job: map 100% reduce 0% 19/06/11 09:56:34 INFO mapreduce.Job: Task Id : attempt_1554176896418_0537_m_000000_2, Status : FAILED AttemptID:attempt_1554176896418_0537_m_000000_2 Timed out after 300 secs 19/06/11 09:56:35 INFO mapreduce.Job: map 0% reduce 0% 19/06/11 09:56:48 INFO mapreduce.Job: map 100% reduce 0% 19/06/11 10:02:05 INFO mapreduce.Job: Job job_1554176896418_0537 failed with state FAILED due to: Task failed task_1554176896418_0537_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 19/06/11 10:02:05 INFO mapreduce.Job: Counters: 9 Job Counters Failed map tasks=4 Launched map tasks=4 Other local map tasks=3 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2624852 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=1312426 Total vcore-seconds taken by all map tasks=1312426 Total megabyte-seconds taken by all map tasks=2687848448 19/06/11 10:02:05 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 19/06/11 10:02:05 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 1,333.3153 seconds (0 bytes/sec) 19/06/11 10:02:05 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 19/06/11 10:02:05 INFO mapreduce.ExportJobBase: Exported 0 records. 19/06/11 10:02:05 ERROR tool.ExportTool: Error during export: Export job failed! Time taken: 1340 s Task IDE_TASK_ADE56470-B5A3-4303-EA75-44312FF8AA0C_20190611093945147 is complete. It can be seen that the import task stopped at INFO mapreduce.Job: map 100% reduce 0% and stopped for 5 minutes. Then the task was automatically restarted and stuck for another 5 minutes. Finally, the task reported a timeout error. Obviously, the direct cause of the task failure is timeout, but the reason for the timeout is that the MapReduce task of the import process is stuck. Why does MapReduce get stuck? This is not mentioned in the error log, which is the most troublesome part when checking the cause. Let me tell you the result first. After a long search, I found that it was because the data length of one row exceeded the field length set by MySQL. That is, when importing the string "The string is very long, very long, very long, very long, very long, very long, very long" into the varchar(50) field, the task is blocked. Here I will summarize the various reasons on the Internet. You can check them one by one. Possible reasons for getting stuck at map 100% reduce 0%: (taking MySQL export as an example)1. Length overflow. The imported data exceeds the field length set in the MySQL table Solution: Reset the field length 2. Coding error. The imported data is not in the MySQL encoding character set Solution: In fact, the encoding corresponding to the UTF-8 character set in the MySQL database is not utf8, but utf8mb4. Therefore, when your imported data contains Emoji expressions or some uncommon Chinese characters, it will not be imported and will be blocked. So you need to pay attention to two points: (1) UseUnicode=true&characterEncoding=utf-8 is specified in the import statement, indicating that the export is in UTF-8 format; (2) In the MySQL table creation statement, ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 is set; 3. Insufficient memory. The amount of data to be imported may be too large, or the allocated memory may be too small. Solution: Either import in batches or allocate more memory to the task 4. The host name is incorrect. Solution: This seems to be a problem with the host name configuration 5. The primary key is repeated. Solution: This is because there are duplicate primary key values in the data you imported. You need to process the data accordingly. Supplement: Solution to MapReduce stuck when Sqoop is exporting data from database to HDFS When exporting data from the database during sqoop, mapreduce gets stuck After searching on Baidu, it seems that I need to set the configuration items about memory and virtual memory in yarn. I didn't configure these items before, and it worked fine. But this time it seems to be running on a larger scale. This failure may occur because the memory and CPU resources allocated to each Docker are too small to meet the default resource requirements for running Hadoop and Hive. The solution is as follows:Add the following configuration to yarn-site.xml: <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>20480</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value> </property> Just shut down yarn and restart it! ! ! The above is my personal experience. I hope it can give you a reference. I also hope that you will support 123WORDPRESS.COM. If there are any mistakes or incomplete considerations, please feel free to correct me. You may also be interested in:
|
<<: How to install redis in docker and set password and connect
>>: Vue3 (Part 2) Integrating Ant Design Vue
This article shares the specific code of js to im...
Table of contents 1. Usage of keep-alive Example ...
Table of contents 1. Shared CommonModule 2. Share...
Start the centos8 virtual machine and press the u...
1. Log in to MySQL database mysql -u root -p View...
Table of contents Preface Motivation for Fragment...
I was bored and sorted out some simple exercises ...
First, let's introduce a few key points about...
We implement a red image style for the clicked bu...
routing vue-router4 keeps most of the API unchang...
Recently, I have a need to list all host names in...
The difference between := and = = Only when setti...
This article records the detailed installation tu...
MySQL partitioning is helpful for managing very l...
This article shares the specific code of js canva...