Detailed tutorial on running selenium+chromedriver on the server

Detailed tutorial on running selenium+chromedriver on the server

1. Introduction

I want to use selenium to scrape data from a website, but sometimes errors occur when using phantomjs. Chrome now also has a headless running mode, so phantomjs will no longer be needed.

But some errors occurred when installing Chrome on the server. Here is a summary of the entire installation process

2. Install Chrome on Ubuntu

# Install Google Chrome
# https://askubuntu.com/questions/79280/how-to-install-chrome-browser-properly-via-command-line
sudo apt-get install libxss1 libappindicator1 libindicator7
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome*.deb # Might show "errors", fixed by next line
sudo apt-get install -f

It should be installed by now, test it by running the following command:

google-chrome --headless --remote-debugging-port=9222 https://chromium.org --disable-gpu

Here we use headless mode for remote debugging. Most Ubuntu machines do not have GPU, so --disable-gpu is used to avoid errors.
You can then open another ssh connection to the server and use the command line to access the server's local port 9222:

curl http://localhost:9222

If it is installed successfully, you will see debugging information. But I will report an error here, and the following is the solution to the error.

1) Possible error solutions

After running the above command, you may get an error message saying that Chrome cannot be run under root. At this time, use the following settings to set up Chrome

1. Find the google-chrome file

My location is /opt/google/chrome/

2. Open the google-chrome file with vi

vi /opt/google/chrome/google-chrome

Found in the file

exec -a "$0" "$HERE/chrome" "$@"

3. Add –user-data-dir –no-sandbox at the end. The entire shell command is

exec -a "$0" "$HERE/chrome" "$@" --user-data-dir --no-sandbox

4. Reopen Google Chrome and you can access it normally!

3. Install chrome driver chromedriver

Download chromedriver

Chromedriver provides an API for operating Chrome and is a bridge for Selenium to control Chrome.

It is best to install the latest version of chromedriver. I remember that I did not install the latest version at the beginning, and an error was reported. There is no problem using the latest version of chromedriver, the latest version can be found at the following address
https://sites.google.com/a/chromium.org/chromedriver/downloads

When I wrote this article, the latest version was 2.37

wget https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip
unzip chromedriver_linux64.zip

At this point, the server-side interface-free version of Chrome is installed.

4. How to use the non-interface version of Chrome

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument("user-agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'")
wd = webdriver.Chrome(chrome_options=chrome_options,executable_path='/home/chrome/chromedriver')

wd.get("https://www.163.com")

content = wd.page_source.encode('utf-8')
print content

wd.quit()

Here, the third setting parameter in chrome_options can prevent the website from detecting that you are using the borderless mode for anti-crawl.

The other two settings below will open Chrome with a user interface on the desktop Linux system or Mac system if they are not set. When debugging, you can comment out the following two lines and use the Chrome with a user interface to debug the program.

chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')

5. References

https://jiayi.space/post/zai-ubuntufu-wu-qi-shang-shi-yong-chrome-headless
https://blog.csdn.net/u013703963/article/details/71083802

Summarize

This is the end of this article about selenium+chromedriver running on the server. For more information about selenium+chromedriver running on the server, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Python uses Selenium to crawl Taobao asynchronously loaded data method
  • Selenium exception handling example code in Python
  • Detailed explanation of using Selenium Chrome under Linux
  • Detailed explanation of Selenium's execution of Javascript script parameters and return values
  • Python + selenium + crontab to realize the daily automatic clock-in function
  • Detailed explanation of configuration options when Selenium starts Chrome
  • Selenium common exception analysis and solution demonstration

<<:  Summary of constructor and super knowledge points in react components

>>:  How to migrate mysql storage location to a new disk

Recommend

vue+el-upload realizes dynamic upload of multiple files

vue+el-upload multiple files dynamic upload, for ...

Detailed explanation of DIV+CSS naming rules can help achieve SEO optimization

1. CSS file naming conventions Suggestion: Use le...

Detailed explanation of asynchronous iterators in nodejs

Table of contents Preface What are asynchronous i...

Echarts implements switching different X-axes in one graph (example code)

Rendering If you want to achieve the effect shown...

HTML+CSS project development experience summary (recommended)

I haven’t updated my blog for several days. I jus...

Several mistakes that JavaScript beginners often make

Table of contents Preface Confusing undefined and...

Vue realizes click flip effect

Use vue to simply implement a click flip effect f...

Use html-webpack-plugin' to generate HTML page plugin in memory

When we package the webpackjs file, we introduce ...

Use pure CSS to disable the a tag in HTML without JavaScript

In fact, this problem has already popped up when I...

JS Canvas interface and animation effects

Table of contents Overview Canvas API: Drawing Gr...

How to create a responsive column chart using CSS Grid layout

I have been playing around with charts for a whil...

mysql 5.7.20 win64 installation and configuration method

mysql-5.7.20-winx64.zipInstallation package witho...

Centos7 installation of FFmpeg audio/video tool simple document

ffmpeg is a very powerful audio and video process...