Implementation of crawler Scrapy image created by dockerfile based on alpine

Implementation of crawler Scrapy image created by dockerfile based on alpine

1. Download the alpine image

[root@DockerBrian ~]# docker pull alpine
Using default tag: latest
Trying to pull repository docker.io/library/alpine ...
latest: Pulling from docker.io/library/alpine
4fe2ade4980c: Pull complete
Digest: sha256:621c2f39f8133acb8e64023a94dbdf0d5ca81896102b9e57c0dc184cadaf5528
Status: Downloaded newer image for docker.io/alpine:latest
[root@docker43 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/alpine-latest 196d12cf6ab1 3 weeks ago 4.41 MB

2. Write Dockerfile

Create a scrapy directory to store the dockerfile file

[root@DockerBrian ~]# mkdir /opt/alpineDockerfile/
[root@DockerBrian ~]# cd /opt/alpineDockerfile/
[root@DockerBrian alpineDockerfile]# mkdir scrapy && cd scrapy && touch Dockerfile
[root@DockerBrian alpineDockerfile]# cd scrapy/
[root@DockerBrian scrapy]# ll
Total dosage 4
-rw-r--r-- 1 root root 1394 Oct 10 11:36 Dockerfile

Writing a Dockerfile

# Specify the created base image FROM alpine
 
# Author description information MAINTAINER alpine_python3_scrapy ([email protected])
 
# Replace the Alibaba Cloud source RUN echo "http://mirrors.aliyun.com/alpine/latest-stable/main/" > /etc/apk/repositories && \
  echo "http://mirrors.aliyun.com/alpine/latest-stable/community/" >> /etc/apk/repositories
 
# Synchronize time # Update the source, install openssh, modify the configuration file, generate the key and synchronize the time RUN apk update && \
  apk add --no-cache openssh-server tzdata && \
  cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
  sed -i "s/#PermitRootLogin.*/PermitRootLogin yes/g" /etc/ssh/sshd_config && \
  ssh-keygen -t rsa -P "" -f /etc/ssh/ssh_host_rsa_key && \
  ssh-keygen -t ecdsa -P "" -f /etc/ssh/ssh_host_ecdsa_key && \
  ssh-keygen -t ed25519 -P "" -f /etc/ssh/ssh_host_ed25519_key && \
  echo "root:h056zHJLg85oW5xh7VtSa" | chpasswd
 
# Install Scrapy dependency packages (required dependencies)
RUN apk add --no-cache python3 python3-dev gcc openssl-dev openssl libressl libc-dev linux-headers libffi-dev libxml2-dev libxml2 libxslt-dev openssh-client openssh-sftp-server
 
# The installation environment requires pip packages (packages here can be added or deleted as needed)
RUN pip3 install --default-timeout=100 --no-cache-dir --upgrade pip setuptools pymysql pymongo redis scrapy-redis ipython Scrapy requests
 
# Start the ssh script RUN echo "/usr/sbin/sshd -D" >> /etc/start.sh && \
  chmod +x /etc/start.sh
 
# Open port 22 EXPOSE 22
 
# Execute the ssh startup command CMD ["/bin/sh","/etc/start.sh"] 

The container can remotely access Scrapy installed in Python3 environment through SSH, and start the SSH service through the start.sh script

3. Create an image

Create an image

[root@DockerBrian scrapy]# docker build -t scrapy_redis_ssh:v1 . 

View Mirror

[root@DockerBrian scrapy]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
scrapy_redis_ssh v1 b2c95ef95fb9 4 hours ago 282 MB
docker.io/alpine-latest 196d12cf6ab1 4 weeks ago 4.41 MB

4. Create a container

Create a container (named scrapy10086, the remote port is mapped to the host port 10086)

Copy the code as follows:
docker run -itd --restart=always --name scrapy10086 -p 10086:22 scrapy_redis_ssh:v1

View Container

[root@DockerBrian scrapy]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7fb9e69d79f5 b2c95ef95fb9 "/bin/sh /etc/star..." 3 hours ago Up 3 hours 0.0.0.0:10086->22/tcp scrapy10086

Login to container

[root@DockerBrian scrapy]# ssh [email protected] -p 10086 
The authenticity of host '[127.0.0.1]:10086 ([127.0.0.1]:10086)' can't be established.
ECDSA key fingerprint is SHA256:wC46AU6SLjHyEfQWX6d6ht9MdpGKodeMOK6/cONcpxk.
ECDSA key fingerprint is MD5:6a:b7:31:3c:63:02:ca:74:5b:d9:68:42:08:be:22:fc.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[127.0.0.1]:10086' (ECDSA) to the list of known hosts.
[email protected]'s password: # The password here is defined in dockerfile echo "root:h056zHJLg85oW5xh7VtSa" | chpasswd
Welcome to Alpine!
 
The Alpine Wiki contains a large amount of how-to guides and general
information about administrating Alpine systems.
See <http://wiki.alpinelinux.org>.
 
You can setup the system with the command: setup-alpine
 
You may change this message by editing /etc/motd.
 
7363738cc96a:~#

5. Testing

Create a scrapy project test

7363738cc96a:~# scrapy startproject test
New Scrapy project 'test', using template directory '/usr/lib/python3.6/site-packages/scrapy/templates/project', created in:
  /root/test
 
You can start your first spider with:
  cd test
  scrapy genspider example example.com
7363738cc96a:~# cd test/
7363738cc96a:~/test# ls
scrapy.cfg test
7363738cc96a:~/test# cd test/
7363738cc96a:~/test/test# ls
__init__.py __pycache__ items.py middlewares.py pipelines.py settings.py spiders
7363738cc96a:~/test/test#

Test success

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:
  • Alpine Docker image font problem solving operations
  • Implementation of tomcat image created with dockerfile based on alpine
  • How to build php-nginx-alpine image from scratch in Docker
  • Perfect solution to Docker Alpine image time zone problem

<<:  Comparison of mydumper and mysqldump in mysql

>>:  Some pitfalls of JavaScript deep copy

Recommend

How to use macros in JavaScript

In languages, macros are often used to implement ...

How to access MySql through IP address

1. Log in to mysql: mysql -u root -h 127.0.0.1 -p...

Detailed explanation of HTML onfocus gain focus and onblur lose focus events

HTML onfocus Event Attributes Definition and Usag...

How to modify the default storage location of Docker images (solution)

Due to the initial partitioning of the system, th...

10 bad habits to avoid in Docker container applications

There is no doubt that containers have become an ...

Jenkins Docker static agent node build process

A static node is fixed on a machine and is starte...

Using HTML web page examples to explain the meaning of the head area code

Use examples to familiarize yourself with the mean...

React encapsulates the global bullet box method

This article example shares the specific code of ...

Use CSS to switch between dark mode and bright mode

In the fifth issue of Web Skills, a technical sol...

7 Best VSCode Extensions for Vue Developers

Adding the right VS Code extension to Visual Stud...

Details of MutationObServer monitoring DOM elements in JavaScript

1. Basic Use It can be instantiated through the M...

Learning to build React scaffolding

1. Complexity of front-end engineering If we are ...