originThe customer ordered a customized system from Sangfor, which was based on CentOS. After a long investigation, we found that the problem was actually file corruption, not a Docker problem. Environmental InformationDocker information: $ docker info Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 2 Server Version: 18.09.3 Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 3.10.0 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 20 Total Memory: 125.3 GiB Name: eds-1f21a854 ID: VZLV:26PU:ZUN6:QQEJ:GW3I:YETT:HMEU:CK6J:SIPL:CHKV:6ASN:5NDF Docker Root Dir: /data/kube/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: reg.wps.lan:5000 treg.yun.wps.cn 127.0.0.0/8 Registry Mirrors: https://registry.docker-cn.com/ https://docker.mirrors.ustc.edu.cn/ Live Restore Enabled: false Product License: Community Engine System Information $ uname -a Linux eds-1f21a854 3.10.0 #1 SMP Mon Sep 28 12:00:30 CST 2020 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" Server Information: $ cat product_name SUGON-60G16 $ cat sys_vendor SANGFOR $ cat product_version 1.2 Troubleshooting processInstalling Docker server hangs Time 2020 10/29 19:51:
At this point, I thought some kernel bug was triggered and the kernel panicked, so the server couldn't start. Time 2020 10/30 9:07:
Then the implementation directly sent a sunflower remote connection. After I went up, I found that it was not a regular operating system, but a modified one based on centos. I didn’t find /var/log/message, and then manually executed our docker installation script. bash -x install-docker.sh Then the output information stopped at a certain step, it should be "hanging". I looked at the last debug information output by the script and found that it was followed by starting Docker, which should be triggered by starting Docker. Then after a long time, I still couldn't connect or ping it, so I asked the implementation staff to check on site whether it was a hardware server and whether there was idrac, ilo, etc. to check the information of the tty console. The on-site personnel checked the server and found it was "starting normally". I tried but still couldn't connect to it. They asked us whether the route was changed. I checked with systemctl on site and found that docker was up. The gateway still cannot be pinged from the site. It suddenly occurred to me that maybe it wasn't hung at all. . . I asked him to run uptime -s to check the last boot time, but it turned out that he had not restarted at all. . . Then we found out on site that it was an iptables problem. When we started docker, their rules were flushed. Later they made some changes and released them all. Therefore, the fact that the machine was hung when starting Docker was actually due to the influence of iptables, which caused the network to be disconnected, and the machine was not restarted at all. Start the container and it crashes Then continue, the implementation said that the above problem did not occur when installing docker on other machines before, but the above problem occurred when starting. I manually executed the deployment, and an error was reported. The script opens -x debugging and sees that an error is reported when loading the deployment image. error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.39/images/load?quiet=0: read unix @->/var/run/docker.sock: read: EOF Manual execution: $ docker load -i ./kube/images/deploy.tar error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.39/images/load?quiet=0: read unix @->/var/run/docker.sock: read: connection reset by peer I checked jounalctl and found no relevant logs for docker daemon. I searched for this error and some people said that the docker group of /var/run/docker.sock does not exist, while others solved it by directly using chmod 777. I tried it but it still doesn't work. Debug docker in the front desk to see if there is any useful information: systemctl stop docker pkill dockerd dockerd -D Open another terminal and execute the load image operation: $ docker load -i ./kube/images/deploy.tar ab6425526dab: Loading layer [===================================================>] 126.3MB/126.3MB c7fe3ea715ef: Loading layer [====================================================>] 340.3MB/340.3MB 7f7eae7934f7: Loading layer [===================================================>] 3.584kB/3.584kB e99a66706b50: Loading layer [====================================================>] 2.105MB/2.105MB 245b79de6bb7: Loading layer [====================================================>] 283.5MB/283.5MB 2a56a4432cef: Loading layer [====================================================>] 93.18kB/93.18kB 0c2ea71066ab: Loading layer [===================================================>] 276.5kB/276.5kB bb3f6df0f87c: Loading layer [====================================================>] 82.94kB/82.94kB 6f34ead3cef7: Loading layer [===================================================>] 946.7kB/946.7kB c21422dd15f6: Loading layer [====================================================>] 1.97MB/1.97MB 940288517f4c: Loading layer [====================================================>] 458.2kB/458.2kB 0c52f1af61f4: Loading layer [====================================================>] 5.12kB/5.12kB 049e7edd48bc: Loading layer [====================================================>] 1.57MB/1.57MB 73307245d702: Loading layer [====================================================>] 5.632kB/5.632kB f109309d8ffc: Loading layer [====================================================>] 2.175MB/2.175MB Loaded image: xxxxxxxxxxxx.cn/platform/deploy-amd64:ubuntu.16.04 $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE xxxxxxxxxxxx.cn/platform/deploy-amd64 ubuntu.16.04 3ad94a76af5d 3 months ago 734MB The front-end log output is normal on the debugging side ... DEBU[2020-10-30T14:48:39.955963986+08:00] Applied tar sha256:049e7edd48bc46e3dd5edf89c9caa8f0f7efbb41af403c5a54dd4f1008f604a7 to d58edd0d97bb672ef40e82e45c1603ca3ceaad847d9b9fc7c9b0588087019649, size: 1518278 DEBU[2020-10-30T14:48:39.960091040+08:00] Applying tar in /data/kube/docker/overlay2/b044bd592ae800ed071208c6b2f650c5cbdc7452702f56a23b9b4ffe4236ac18/diff storage-driver=overlay2 DEBU[2020-10-30T14:48:40.059510528+08:00] Applied tar sha256:73307245d7021f9627ca0b2cbfeab3aac0b65abfd476f6ec26bb92c75892d7e2 to b044bd592ae800ed071208c6b2f650c5cbdc7452702f56a23b9b4ffe4236ac18, size: 3284 DEBU[2020-10-30T14:48:40.063040538+08:00] Applying tar in /data/kube/docker/overlay2/03918b1d275aa284532b8b9c59ca158409416f904e13cc7084c598ed343e844f/diff storage-driver=overlay2 DEBU[2020-10-30T14:48:40.148209852+08:00] Applied tar sha256:f109309d8ffcb76589ad6389e80335d986b411c80122d990ab00a02a3a916e3e to 03918b1d275aa284532b8b9c59ca158409416f904e13cc7084c598ed343e844f, size: 2072803 ^CINFO[2020-10-30T14:48:55.593523177+08:00] Processing signal 'interrupt' DEBU[2020-10-30T14:48:55.593617229+08:00] daemon configured with a 15 seconds minimum shutdown timeout DEBU[2020-10-30T14:48:55.593638628+08:00] start clean shutdown of all containers with a 15 seconds timeout... DEBU[2020-10-30T14:48:55.594074457+08:00] Unix socket /run/docker/libnetwork/ebd15186e86385c48c4c5508d5f30eb83d5d74e56f09af5c82b6d6d9d63ec8b8.sock doesn't exist. cannot accept client connections DEBU[2020-10-30T14:48:55.594106623+08:00] Cleaning up old mountid : start. INFO[2020-10-30T14:48:55.594157536+08:00] stopping event stream following graceful shutdown error="<nil>" module=libcontainerd namespace=moby DEBU[2020-10-30T14:48:55.594343122+08:00] Cleaning up old mountid : done. DEBU[2020-10-30T14:48:55.594501828+08:00] Clean shutdown succeeded INFO[2020-10-30T14:48:55.594520918+08:00] stopping healthcheck following graceful shutdown module=libcontainerd INFO[2020-10-30T14:48:55.594531978+08:00] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=plugins.moby DEBU[2020-10-30T14:48:55.594603119+08:00] received signal signal=terminated INFO[2020-10-30T14:48:55.594739890+08:00] pickfirstBalancer: HandleSubConnStateChange: 0xc4201a61b0, TRANSIENT_FAILURE module=grpc INFO[2020-10-30T14:48:55.594751465+08:00] pickfirstBalancer: HandleSubConnStateChange: 0xc4201a61b0, CONNECTING module=grpc I looked at the systemd configuration and found nothing special. I was confused as to why it could be imported when running in the foreground. I couldn't think of how to troubleshoot it, so I suspected it might be a socket problem. I tried to use socat to forward it to tcp, but it still didn't work (here you should try adding tcp listening 127 to daemon, not through socket, and socat eventually passed the socket) $ socat -d -d TCP-LISTEN:2375,fork,bind=127.0.0.1 UNIX:/var/run/docker.sock 2020/10/30 17:39:58 socat[5201] N listening on AF=2 127.0.0.1:2375 ^[[C2020/10/30 17:42:06 socat[5201] N accepting connection from AF=2 127.0.0.1:35370 on AF=2 127.0.0.1:2375 2020/10/30 17:42:06 socat[5201] N forked off child process 11501 2020/10/30 17:42:06 socat[5201] N listening on AF=2 127.0.0.1:2375 2020/10/30 17:42:06 socat[11501] N opening connection to AF=1 "/var/run/docker.sock" 2020/10/30 17:42:06 socat[11501] N successfully connected from local address AF=1 "\0\0\0\0\0\0 \x0D\x@7\xE9\xEC\x7E\0\0\0\x01\0\0\0\0" 2020/10/30 17:42:06 socat[11501] N starting data transfer loop with FDs [6,6] and [5,5] 2020/10/30 17:42:12 socat[11501] E write(5, 0x55f098349920, 8192): Broken pipe 2020/10/30 17:42:12 socat[11501] N exit(1) 2020/10/30 17:42:12 socat[5201] N childdied(): handling signal 17 $ docker --log-level debug -H tcp://127.0.0.1:2375 load -i kube/images/deploy.tar c7fe3ea715ef: Loading layer [============================================> ] 286.9MB/340.3MB unexpected EOF It took quite a long time in the end. I was busy at the time, so I went to check another customer's problem. Then I came back here and tried to load other images on a whim. It worked. . . $ docker load -i kube/images/pause_3.1.tar e17133b79956: Loading layer [====================================================>] 744.4kB/744.4kB Loaded image: mirrorgooglecontainers/pause-amd64:3.1 $ docker load -i kube/images/tiller_v2.16.1.tar 77cae8ab23bf: Loading layer [====================================================>] 5.815MB/5.815MB 679105aa33fb: Loading layer [====================================================>] 6.184MB/6.184MB 639eab5d05b1: Loading layer [====================================================>] 40.46MB/40.46MB 87e5687e03f2: Loading layer [====================================================>] 41.13MB/41.13MB Loaded image: gcr.io/kubernetes-helm/tiller:v2.16.1 $ docker load -i kube/images/calico_v3.1.3.tar cd7100a72410: Loading layer [====================================================>] 4.403MB/4.403MB ddc4cb8dae60: Loading layer [===================================================>] 7.84MB/7.84MB 77087b8943a2: Loading layer [===================================================>] 249.3kB/249.3kB c7227c83afaf: Loading layer [====================================================>] 4.801MB/4.801MB 2e0e333a66b6: Loading layer [===================================================>] 231.8MB/231.8MB Loaded image: calico/node:v3.1.3 2580685bfb60: Loading layer [====================================================>] 50.84MB/50.84MB Loaded image: calico/kube-controllers:v3.1.3 0314be9edf00: Loading layer [====================================================>] 1.36MB/1.36MB 15db169413e5: Loading layer [===================================================>] 28.05MB/28.05MB 4252efcc5013: Loading layer [====================================================>] 2.818MB/2.818MB 76cf2496cf36: Loading layer [===================================================>] 3.03MB/3.03MB 91d3d3a16862: Loading layer [====================================================>] 2.995MB/2.995MB 18a58488ba3b: Loading layer [====================================================>] 3.474MB/3.474MB 8d8197f49da2: Loading layer [====================================================>] 27.34MB/27.34MB 7520364e0845: Loading layer [===================================================>] 9.216kB/9.216kB b9d064622bd6: Loading layer [===================================================>] 2.56kB/2.56kB Loaded image: calico/cni:v3.1.3 Only when importing this will an error occur $ docker load -i kube/images/deploy.tar error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.39/images/load?quiet=0: read unix @->/var/run/docker.sock: read: connection reset by peer Then the checksum of the file was compared on the machine that produced the package and found that it was incorrect. . . . SummarizeOne question is why the front-end can do it. Secondly, when the file is damaged and imported, the docker daemon does not flush any logs and directly resets the connection. The new version has not tested this situation. This is the end of this article about the time-consuming troubleshooting process of docker errors. For more relevant content on time-consuming troubleshooting of docker errors, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Professional MySQL development design specifications and SQL writing specifications
>>: JavaScript dynamically generates a table with row deletion function
To achieve the following goals: Mysql database wi...
This article records the installation and configu...
The display without the effect picture is just em...
React multiple ways to get the value of the input...
What are the lifecycle functions of React compone...
Batch replace part of the data of a field in MYSQ...
Table of contents 1. Event Processing Model 1. Ev...
Table of contents 1. Why is JavaScript single-thr...
Note 1: Solve the problem of slow connection to M...
Table of contents 01 Introduction to GTID 02 How ...
Table of contents Solution: 1. IGNORE 2. REPLACE ...
Preface This article mainly introduces the releva...
Less is More is a catchphrase for many designers....
Recently I wrote in my blog that in the project l...
After installing the database, if you accidentall...