1. Introduction I want to use selenium to scrape data from a website, but sometimes errors occur when using phantomjs. Chrome now also has a headless running mode, so phantomjs will no longer be needed. But some errors occurred when installing Chrome on the server. Here is a summary of the entire installation process 2. Install Chrome on Ubuntu # Install Google Chrome # https://askubuntu.com/questions/79280/how-to-install-chrome-browser-properly-via-command-line sudo apt-get install libxss1 libappindicator1 libindicator7 wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb sudo dpkg -i google-chrome*.deb # Might show "errors", fixed by next line sudo apt-get install -f It should be installed by now, test it by running the following command:
Here we use headless mode for remote debugging. Most Ubuntu machines do not have GPU, so --disable-gpu is used to avoid errors. curl http://localhost:9222 If it is installed successfully, you will see debugging information. But I will report an error here, and the following is the solution to the error. 1) Possible error solutions After running the above command, you may get an error message saying that Chrome cannot be run under root. At this time, use the following settings to set up Chrome 1. Find the google-chrome file My location is /opt/google/chrome/ 2. Open the google-chrome file with vi
Found in the file
3. Add –user-data-dir –no-sandbox at the end. The entire shell command is
4. Reopen Google Chrome and you can access it normally! 3. Install chrome driver chromedriver Download chromedriver Chromedriver provides an API for operating Chrome and is a bridge for Selenium to control Chrome. It is best to install the latest version of chromedriver. I remember that I did not install the latest version at the beginning, and an error was reported. There is no problem using the latest version of chromedriver, the latest version can be found at the following address When I wrote this article, the latest version was 2.37 wget https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip At this point, the server-side interface-free version of Chrome is installed. 4. How to use the non-interface version of Chrome from selenium import webdriver chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') chrome_options.add_argument("user-agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'") wd = webdriver.Chrome(chrome_options=chrome_options,executable_path='/home/chrome/chromedriver') wd.get("https://www.163.com") content = wd.page_source.encode('utf-8') print content wd.quit() Here, the third setting parameter in chrome_options can prevent the website from detecting that you are using the borderless mode for anti-crawl. The other two settings below will open Chrome with a user interface on the desktop Linux system or Mac system if they are not set. When debugging, you can comment out the following two lines and use the Chrome with a user interface to debug the program. chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') 5. References https://jiayi.space/post/zai-ubuntufu-wu-qi-shang-shi-yong-chrome-headless Summarize This is the end of this article about selenium+chromedriver running on the server. For more information about selenium+chromedriver running on the server, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Summary of constructor and super knowledge points in react components
>>: How to migrate mysql storage location to a new disk
Table of contents 1. Install vue-video-player 2. ...
Table of contents 1. Job Execution Fault Toleranc...
How to install PHP7 on Linux? 1. Install dependen...
1. Linux installation (root user operation) 1. In...
First, let's take a look at the relative leng...
1. Installation Environment Computer model: Lenov...
1. Environmental Preparation Tencent Cloud Server...
Table of contents Problem Description The general...
Table of contents Preface Implementation ideas Ef...
one: 1. Semantic tags are just HTML, there is no ...
Automated project deployment is more commonly use...
There is no problem with the Dockerfile configura...
The implementation idea of the javascript game ...
Table of contents Preface: Step 1: Find the free ...
Introduction The meta tag is an auxiliary tag in ...