Nginx anti-crawler strategy to prevent UA from crawling websites

Nginx anti-crawler strategy to prevent UA from crawling websites

Added anti-crawler policy file:

vim /usr/www/server/nginx/conf/anti_spider.conf

File Contents

#Disable crawling by tools such as Scrapy if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) { 
   return 403; 
} 
#Disable access with specified UA or empty UAif ($http_user_agent ~ "WinHttp|WebZIP|FetchURL|node-superagent|java/|FeedDemon|Jullo|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|Java|Feedly|Apache-HttpAsyncClient|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|BOT/0.1|YandexBot|FlightDeckReports|Linguee Bot|^$" ) { 
   return 403;        
} 
#Disable crawling by methods other than GET|HEAD|POST if ($request_method !~ ^(GET|HEAD|POST)$) { 
  return 403; 
}
#The command to block a single IP is #deny 123.45.6.7
#Block the entire segment from 123.0.0.1 to 123.255.255.254#deny 123.0.0.0/8
#Block the IP range from 123.45.0.1 to 123.45.255.254 #deny 124.45.0.0/16
#The command to block the IP range from 123.45.6.1 to 123.45.6.254 is #deny 123.45.6.0/24
# The following IPs are all rogue #deny 58.95.66.0/24;

Configuration Usage

Introduce in the site's server

# Anti-crawler include /usr/www/server/nginx/conf/anti_spider.conf

Finally restart nginx

Verify whether it is valid

Simulating YYSpider

λ curl -X GET -I -A 'YYSpider' https://www.myong.top
HTTP/1.1 200 Connection established
HTTP/2 403
server: marco/2.11
date: Fri, 20 Mar 2020 08:48:50 GMT
content-type: text/html
content-length: 146
x-source: C/403
x-request-id: 3ed800d296a12ebcddc4d61c57500aa2

Simulate Baiduspider

λ curl -X GET -I -A 'BaiduSpider' https://www.myong.top
HTTP/1.1 200 Connection established
HTTP/2 200
server: marco/2.11
date: Fri, 20 Mar 2020 08:49:47 GMT
content-type: text/html
vary: Accept-Encoding
x-source: C/200
last-modified: Wed, 18 Mar 2020 13:16:50 GMT
etag: "5e721f42-150ce"
x-request-id: e82999a78b7d7ea2e9ff18b6f1f4cc84

Common User-Agents for Crawler

FeedDemon content collection BOT/0.1 (BOT for JCE) sql injection CrawlDaddy sql injection Java content collection Jullo content collection Feedly content collection UniversalFeedParser content collection ApacheBench cc attacker Swiftbot useless crawler YandexBot useless crawler AhrefsBot useless crawler YisouSpider useless crawler (has been acquired by UC Shenma Search, this spider can be released!) 
jikeSpider useless crawlerMJ12bot useless crawlerZmEu phpmyadmin vulnerability scanningWinHttp collectioncc attackEasouSpider useless crawlerHttpClient tcp attackMicrosoft URL Control scanningYYSpider useless crawlerjaunty wordpress blasting scanneroBot useless crawlerPython-urllib content collectionIndy Library scanningFlightDeckReports Bot useless crawlerLinguee Bot useless crawler

The above is the details of Nginx anti-crawler strategy to prevent UA from crawling the website. For more information about Nginx anti-crawler, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • SpringBoot+webMagic implements website crawler example code
  • Springboot+webmagic implements java crawler jdbc and mysql method
  • Python crawler crawls Taobao product price comparison (with Taobao anti-crawler mechanism solution)
  • Summary of methods to bypass anti-crawler in Python
  • Detailed explanation of how to deal with Python anti-crawler knowledge points with cookies
  • Detailed explanation of 4 ways to bypass anti-crawler mechanisms in Selenium-webdriver
  • Website Anti-Crawler Strategy
  • Python common anti-crawler strategies
  • Selenium anti-crawler to skip the Taobao slider verification function implementation code
  • Python crawler and anti-crawler war
  • Python anti-crawler disguises the browser to crawl
  • How to use springboot anti-crawler component kk-anti-reptile

<<:  Web interview: The difference between MVC and MVVM and why Vue does not fully comply with MVVM

>>:  How to optimize MySQL index function based on Explain keyword

Recommend

Solution to VMware virtual machine no network

Table of contents 1. Problem Description 2. Probl...

React Hook usage examples (6 common hooks)

1. useState: Let functional components have state...

Docker large-scale project containerization transformation

Virtualization and containerization are two inevi...

Vue3 list interface data display details

Table of contents 1. List interface display examp...

Detailed explanation of MySQL data grouping

Create Group Grouping is established in the GROUP...

js to implement a simple bullet screen system

This article shares the specific code of native j...

GET POST Differences

1. Get is used to obtain data from the server, wh...

How to connect to a remote docker server with a certificate

Table of contents 1. Use scripts to encrypt TLS f...

Summary of HTML Hack Tags in IE Browser

Copy code The code is as follows: <!--[if !IE]...

Nodejs module system source code analysis

Table of contents Overview CommonJS Specification...

Calling Baidu Map to obtain longitude and latitude in Vue

In the project, it is necessary to obtain the lat...