Implementation of crawler Scrapy image created by dockerfile based on alpine

Implementation of crawler Scrapy image created by dockerfile based on alpine

1. Download the alpine image

[root@DockerBrian ~]# docker pull alpine
Using default tag: latest
Trying to pull repository docker.io/library/alpine ...
latest: Pulling from docker.io/library/alpine
4fe2ade4980c: Pull complete
Digest: sha256:621c2f39f8133acb8e64023a94dbdf0d5ca81896102b9e57c0dc184cadaf5528
Status: Downloaded newer image for docker.io/alpine:latest
[root@docker43 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/alpine-latest 196d12cf6ab1 3 weeks ago 4.41 MB

2. Write Dockerfile

Create a scrapy directory to store the dockerfile file

[root@DockerBrian ~]# mkdir /opt/alpineDockerfile/
[root@DockerBrian ~]# cd /opt/alpineDockerfile/
[root@DockerBrian alpineDockerfile]# mkdir scrapy && cd scrapy && touch Dockerfile
[root@DockerBrian alpineDockerfile]# cd scrapy/
[root@DockerBrian scrapy]# ll
Total dosage 4
-rw-r--r-- 1 root root 1394 Oct 10 11:36 Dockerfile

Writing a Dockerfile

# Specify the created base image FROM alpine
 
# Author description information MAINTAINER alpine_python3_scrapy ([email protected])
 
# Replace the Alibaba Cloud source RUN echo "http://mirrors.aliyun.com/alpine/latest-stable/main/" > /etc/apk/repositories && \
  echo "http://mirrors.aliyun.com/alpine/latest-stable/community/" >> /etc/apk/repositories
 
# Synchronize time # Update the source, install openssh, modify the configuration file, generate the key and synchronize the time RUN apk update && \
  apk add --no-cache openssh-server tzdata && \
  cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
  sed -i "s/#PermitRootLogin.*/PermitRootLogin yes/g" /etc/ssh/sshd_config && \
  ssh-keygen -t rsa -P "" -f /etc/ssh/ssh_host_rsa_key && \
  ssh-keygen -t ecdsa -P "" -f /etc/ssh/ssh_host_ecdsa_key && \
  ssh-keygen -t ed25519 -P "" -f /etc/ssh/ssh_host_ed25519_key && \
  echo "root:h056zHJLg85oW5xh7VtSa" | chpasswd
 
# Install Scrapy dependency packages (required dependencies)
RUN apk add --no-cache python3 python3-dev gcc openssl-dev openssl libressl libc-dev linux-headers libffi-dev libxml2-dev libxml2 libxslt-dev openssh-client openssh-sftp-server
 
# The installation environment requires pip packages (packages here can be added or deleted as needed)
RUN pip3 install --default-timeout=100 --no-cache-dir --upgrade pip setuptools pymysql pymongo redis scrapy-redis ipython Scrapy requests
 
# Start the ssh script RUN echo "/usr/sbin/sshd -D" >> /etc/start.sh && \
  chmod +x /etc/start.sh
 
# Open port 22 EXPOSE 22
 
# Execute the ssh startup command CMD ["/bin/sh","/etc/start.sh"] 

The container can remotely access Scrapy installed in Python3 environment through SSH, and start the SSH service through the start.sh script

3. Create an image

Create an image

[root@DockerBrian scrapy]# docker build -t scrapy_redis_ssh:v1 . 

View Mirror

[root@DockerBrian scrapy]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
scrapy_redis_ssh v1 b2c95ef95fb9 4 hours ago 282 MB
docker.io/alpine-latest 196d12cf6ab1 4 weeks ago 4.41 MB

4. Create a container

Create a container (named scrapy10086, the remote port is mapped to the host port 10086)

Copy the code as follows:
docker run -itd --restart=always --name scrapy10086 -p 10086:22 scrapy_redis_ssh:v1

View Container

[root@DockerBrian scrapy]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7fb9e69d79f5 b2c95ef95fb9 "/bin/sh /etc/star..." 3 hours ago Up 3 hours 0.0.0.0:10086->22/tcp scrapy10086

Login to container

[root@DockerBrian scrapy]# ssh [email protected] -p 10086 
The authenticity of host '[127.0.0.1]:10086 ([127.0.0.1]:10086)' can't be established.
ECDSA key fingerprint is SHA256:wC46AU6SLjHyEfQWX6d6ht9MdpGKodeMOK6/cONcpxk.
ECDSA key fingerprint is MD5:6a:b7:31:3c:63:02:ca:74:5b:d9:68:42:08:be:22:fc.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[127.0.0.1]:10086' (ECDSA) to the list of known hosts.
[email protected]'s password: # The password here is defined in dockerfile echo "root:h056zHJLg85oW5xh7VtSa" | chpasswd
Welcome to Alpine!
 
The Alpine Wiki contains a large amount of how-to guides and general
information about administrating Alpine systems.
See <http://wiki.alpinelinux.org>.
 
You can setup the system with the command: setup-alpine
 
You may change this message by editing /etc/motd.
 
7363738cc96a:~#

5. Testing

Create a scrapy project test

7363738cc96a:~# scrapy startproject test
New Scrapy project 'test', using template directory '/usr/lib/python3.6/site-packages/scrapy/templates/project', created in:
  /root/test
 
You can start your first spider with:
  cd test
  scrapy genspider example example.com
7363738cc96a:~# cd test/
7363738cc96a:~/test# ls
scrapy.cfg test
7363738cc96a:~/test# cd test/
7363738cc96a:~/test/test# ls
__init__.py __pycache__ items.py middlewares.py pipelines.py settings.py spiders
7363738cc96a:~/test/test#

Test success

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:
  • Alpine Docker image font problem solving operations
  • Implementation of tomcat image created with dockerfile based on alpine
  • How to build php-nginx-alpine image from scratch in Docker
  • Perfect solution to Docker Alpine image time zone problem

<<:  Comparison of mydumper and mysqldump in mysql

>>:  Some pitfalls of JavaScript deep copy

Recommend

Detailed explanation of the difference between $router and $route in Vue

We usually use routing in vue projects, and vue-r...

Introduction to installing and configuring JDK under CentOS system

Table of contents Preface Check and uninstall Ope...

Linux sar command usage and code example analysis

1. CPU utilization sar -p (view all day) sar -u 1...

Native Js implementation of calendar widget

This article example shares the specific code of ...

JavaScript data structure bidirectional linked list

A singly linked list can only be traversed from t...

Solutions for high traffic websites

First: First, confirm whether the server hardware ...

Interactive experience trends that will become mainstream in 2015-2016

The most important interactive design article in ...

CSS implements the web component function of sliding the message panel

Hello everyone, I wonder if you have the same con...

In-depth understanding of Vue's data responsiveness

Table of contents 1. ES syntax getter and setter ...

How to create a table in mysql and add field comments

Directly post code and examples #Write comments w...

MySQL sorting using index scan

Table of contents Install sakila Index Scan Sort ...

MySQL 5.7.16 ZIP package installation and configuration tutorial

This article shares the installation and configur...

How to customize at and cron scheduled tasks in Linux

There are two types of scheduled tasks in Linux s...

MySQL quick recovery solution based on time point

The reason for writing such an article is that on...