During the crawler development process, you must have encountered a situation where you need to deploy the crawler on multiple servers. How do you operate at this time? SSH into each server one by one, pull down the code using git, and then run? The code has been modified, so we need to log in to each server one by one and update it in sequence? Sometimes the crawler only needs to run on one server, sometimes it needs to run on 200 servers. How do you switch quickly? Log in to each server one by one to turn it on and off? Or be smart and set a modifiable flag in Redis so that only the crawler on the server corresponding to the flag runs? Crawler A has been deployed on all servers. Now you have made a crawler B. Do you have to log in to each server one by one and deploy it again? If you did, you should regret not seeing this article sooner. After reading this article, you will be able to: Deploy a new crawler to 50 servers in 2 minutes: docker build -t localhost:8003/spider:0.01 . docker push localhost:8002/spider:0.01 docker service create --name spider --replicas 50 --network host 45.77.138.242:8003/spider:0.01 Scaling crawlers from 50 to 500 servers in 30 seconds: docker service scale spider=500 Batch shut down crawlers on all servers within 30 seconds: docker service scale spider=0 Batch update crawlers on all machines within 1 minute: docker build -t localhost:8003/spider:0.02 . docker push localhost:8003/spider:0.02 docker service update --image 45.77.138.242:8003/spider:0.02 spider This article will not teach you how to use Docker, so please make sure you have some Docker basics before reading this article. What is Docker Swarm? Docker Swarm is a cluster management module that comes with Docker. It can create and manage Docker clusters. Environment Construction This article will use three Ubuntu 18.04 servers for demonstration. The three servers are arranged as follows: Master: 45.77.138.242 Slave-1: 199.247.30.74 Slave-2: 95.179.143.21 Docker Swarm is a module based on Docker, so you must first install Docker on three servers. After installing Docker, all operations are completed in Docker. Install Docker on the Master Install Docker on the Master server by executing the following commands in sequence: apt-get update apt-get install -y apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable" apt-get update apt-get install -y docker-ce Creating a Manager Node A Docker Swarm cluster requires a Manager node. Now initialize the Master server as the Manager node of the cluster. Run the following command. docker swarm init After the run is complete, you can see the returned results as shown in the figure below. In this return result, a command is given: Copy the code as follows: docker swarm join --token SWMTKN-1-0hqsajb64iynkg8ocp8uruktii5esuo4qiaxmqw2pddnkls9av-dfj7nf1x3vr5qcj4cqiusu4pv 45.77.138.242:2377 This command needs to be executed in each slave node. Now record this command. After initialization is complete, you will get a Docker cluster with only one server. Execute the following command: docker node ls You can see the current status of the cluster, as shown in the following figure. Create a private origin (optional) Creating a private origin is not required. The reason why a private source is needed is that the project's Docker image may involve company secrets and cannot be uploaded to a public platform such as DockerHub. If your image can be publicly uploaded to DockerHub, or you already have a private image repository available, you can use that directly and skip this section and the next one. The private source itself is also a Docker image. Pull it down first: docker pull registry:latest As shown in the figure below. Now start the private source: Copy the code as follows: docker run -d -p 8003:5000 --name registry -v /tmp/registry:/tmp/registry docker.io/registry:latest As shown in the figure below. In the startup command, the open port is set to port 8003, so the address of the private source is: 45.77.138.242:8003 hint: The private source built in this way uses HTTP and has no permission verification mechanism, so if it is open to the public network, you need to use a firewall to make an IP whitelist to ensure data security. Allow Docker to use trusted http private origins (optional) If you use the command in the previous section to build your own private source, since Docker does not allow the use of HTTP private sources by default, you need to configure Docker to trust it. Configure Docker using the following command: echo '{ "insecure-registries":["45.77.138.242:8003"] }' >> /etc/docker/daemon.json Then restart docker using the following command. systemctl restart docker As shown in the figure below. After the restart is complete, the Manager node is configured. Create a child node initialization script For the Slave server, only three things need to be done:
From now on, all other tasks will be managed by Docker Swarm itself, and you will no longer need to log in to the server through SSH. To simplify the operation, you can write a shell script to run it in batches. Create an apt-get update apt-get install -y apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable" apt-get update apt-get install -y docker-ce echo '{ "insecure-registries":["45.77.138.242:8003"] }' >> /etc/docker/daemon.json systemctl restart docker docker swarm join --token SWMTKN-1-0hqsajb64iynkg8ocp8uruktii5esuo4qiaxmqw2pddnkls9av-dfj7nf1x3vr5qcj4cqiusu4pv 45.77.138.242:2377 Make this file available and run: chmod +x init.sh ./init.sh As shown in the figure below. After the script is finished, you can log out from SSH on Slave-1 and Slave-2. There is no need to come in again in the future. Go back to the Master server and execute the following command to confirm that the cluster now has 3 nodes: docker node ls You can see that there are now 3 nodes in the cluster. As shown in the figure below. So far, the most complicated and troublesome process has been over. All that’s left is to experience the convenience brought by Docker Swarm. Creating a test program Build and test Redis Since we need to simulate the running effect of a distributed crawler, we first use Docker to build a temporary Redis service: Execute the following command on the Master server: Copy the code as follows: docker run -d --name redis -p 7891:6379 redis --requirepass "KingnameISHandSome8877" This Redis uses port Writing a test program Write a simple Python program: import time import redis client = redis.Redis(host='45.77.138.242', port='7891', password='KingnameISHandSome8877') while True: data = client.lpop('example:swarm:spider') if not data: break print(f'The data I am getting now is: {data.decode()}') time.sleep(10) This Python program reads a number from Redis every 10 seconds and prints it out. Writing a Dockerfile Write Dockerfile to create our own image based on the Python 3.6 image: from python:3.6 label mantainer='[email protected]' user root ENV PYTHONUNBUFFERED=0 ENV PYTHONIOENCODING=utf-8 run python3 -m pip install redis copy spider.py spider.py cmd python3 spider.py Build the image After writing the Dockerfile, execute the following command to start building our own image: docker build -t localhost:8003/spider:0.01 . It is important to note that since we want to upload this image to a private source for download by the slave node on the Slave server, the image naming method needs to satisfy the format of The whole process is shown in the figure below. Upload the image to a private repository After the image is built, it needs to be uploaded to a private source. At this time, you need to execute the command: docker push localhost:8003/spider:0.01 As shown in the figure below. Everyone remember the build and upload commands. You will need to use these two commands every time you update the code in the future. Creating a Service Docker Swarm runs services one by one, so you need to use the docker service command to create services. Copy the code as follows: docker service create --name spider --network host 45.77.138.242:8003/spider:0.01 This command creates a service called Of course, you can also run multiple containers at once by adding a Copy the code as follows: docker service create --name spider --replicas 50 --network host 45.77.138.242:8003/spider:0.01 However, the initial code may have many bugs, so it is recommended to use one container to run it first, observe the logs, and then expand it after finding no problems. Back to the default case of one container, this container may be on any of the three machines currently. Execute the following command to observe the operation of this default container: docker service ps spider As shown in the figure below. View Node Log According to the execution results in the figure above, you can see that the ID of the running container is docker service logs -f container ID At this time, the log of this container will continue to be tracked. As shown in the figure below. Horizontal Scaling Now, there is only one server running a container. I want to use three servers to run this crawler, so I just need to execute one command: docker service scale spider=3 The running effect is shown in the figure below. At this point, check the crawler's running status again and you can find that a container is running on each of the three machines. As shown in the figure below. Now, we log in to the slave-1 machine to see if there is actually a task running. As shown in the figure below. You can see that there is indeed a container running on it. This is automatically assigned by Docker Swarm. Now we use the following command to forcibly shut down Docker on slave-1 and see the effect. systemctl stop docker Go back to the master server and check the running effect of the crawler again, as shown in the figure below. As you can see, after Docker Swarm detects that Slave-1 is offline, it will automatically find a new machine to start the task, ensuring that there are always three tasks running. In this example, Docker Swarm automatically starts two spider containers on the master machine. If the machine performance is good, you can even run more containers on each machine: docker service scale spider=10 At this point, 10 containers will be started to run these crawlers. These 10 crawlers are isolated from each other. What if you want to stop all crawlers? Very simple, one command: docker service scale spider=0 This will stop all crawlers. View logs of multiple containers simultaneously What if you want to see all containers at the same time? You can use the following command to view the latest 20 lines of logs for all containers: Copy the code as follows: docker service ps robot | grep Running | awk '{print $1}' | xargs -i docker service logs --tail 20 {} In this way, the logs will be displayed in order. As shown in the figure below. Update crawler If you make changes to your code. Then you need to update your crawler. First modify the code, rebuild it, and resubmit the new image to the private source. As shown in the figure below. Next you need to update the image in the service. There are two ways to update the image. One is to close all crawlers first and then update. docker service scale spider=0 docker service update --image 45.77.138.242:8003/spider:0.02 spider docker service scale spider=3 The second is to directly execute the update command. docker service update --image 45.77.138.242:8003/spider:0.02 spider The difference between them is that when the update command is executed directly, the running containers will be updated one by one. The running effect is shown in the figure below. You can do more with Docker Swarm This article uses an example of a simulated crawler, but obviously, any program that can be run in batches can be run with Docker Swarm, whether you use Redis or Celery to communicate, whether you need communication or not, as long as it can be run in batches, you can use Docker Swarm. In the same Swarm cluster, you can run multiple different services without affecting each other. You can truly build a Docker Swarm cluster once and then never have to worry about it again. All future operations only need to be run on the server where the Manager node is located. The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM. You may also be interested in:
|
<<: Summary of js execution context and scope
>>: Detailed steps for installing and configuring mysql 5.6.21
Table of contents 1. Introduction 2. Entry mode o...
Table of contents 1. Basic Use 2. Working Princip...
MySQL partitioning is helpful for managing very l...
Table of contents Use Cases Reactive API related ...
Table of contents Preface optimization SSR Import...
Introduction Memcached is a distributed caching s...
This article shares the specific code of JavaScri...
This article example shares the specific code of ...
Note: This article has been translated by someone ...
LEMP (Linux + Nginx + MySQL + PHP) is basically a...
During this period of time, I was studying docker...
As shown below, if it were you, how would you ach...
1. Docker installation and startup yum install ep...
Table of contents Three steps to operate the data...