Nginx anti-crawler strategy to prevent UA from crawling websites

Nginx anti-crawler strategy to prevent UA from crawling websites

Added anti-crawler policy file:

vim /usr/www/server/nginx/conf/anti_spider.conf

File Contents

#Disable crawling by tools such as Scrapy if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) { 
   return 403; 
} 
#Disable access with specified UA or empty UAif ($http_user_agent ~ "WinHttp|WebZIP|FetchURL|node-superagent|java/|FeedDemon|Jullo|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|Java|Feedly|Apache-HttpAsyncClient|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|BOT/0.1|YandexBot|FlightDeckReports|Linguee Bot|^$" ) { 
   return 403;        
} 
#Disable crawling by methods other than GET|HEAD|POST if ($request_method !~ ^(GET|HEAD|POST)$) { 
  return 403; 
}
#The command to block a single IP is #deny 123.45.6.7
#Block the entire segment from 123.0.0.1 to 123.255.255.254#deny 123.0.0.0/8
#Block the IP range from 123.45.0.1 to 123.45.255.254 #deny 124.45.0.0/16
#The command to block the IP range from 123.45.6.1 to 123.45.6.254 is #deny 123.45.6.0/24
# The following IPs are all rogue #deny 58.95.66.0/24;

Configuration Usage

Introduce in the site's server

# Anti-crawler include /usr/www/server/nginx/conf/anti_spider.conf

Finally restart nginx

Verify whether it is valid

Simulating YYSpider

λ curl -X GET -I -A 'YYSpider' https://www.myong.top
HTTP/1.1 200 Connection established
HTTP/2 403
server: marco/2.11
date: Fri, 20 Mar 2020 08:48:50 GMT
content-type: text/html
content-length: 146
x-source: C/403
x-request-id: 3ed800d296a12ebcddc4d61c57500aa2

Simulate Baiduspider

λ curl -X GET -I -A 'BaiduSpider' https://www.myong.top
HTTP/1.1 200 Connection established
HTTP/2 200
server: marco/2.11
date: Fri, 20 Mar 2020 08:49:47 GMT
content-type: text/html
vary: Accept-Encoding
x-source: C/200
last-modified: Wed, 18 Mar 2020 13:16:50 GMT
etag: "5e721f42-150ce"
x-request-id: e82999a78b7d7ea2e9ff18b6f1f4cc84

Common User-Agents for Crawler

FeedDemon content collection BOT/0.1 (BOT for JCE) sql injection CrawlDaddy sql injection Java content collection Jullo content collection Feedly content collection UniversalFeedParser content collection ApacheBench cc attacker Swiftbot useless crawler YandexBot useless crawler AhrefsBot useless crawler YisouSpider useless crawler (has been acquired by UC Shenma Search, this spider can be released!) 
jikeSpider useless crawlerMJ12bot useless crawlerZmEu phpmyadmin vulnerability scanningWinHttp collectioncc attackEasouSpider useless crawlerHttpClient tcp attackMicrosoft URL Control scanningYYSpider useless crawlerjaunty wordpress blasting scanneroBot useless crawlerPython-urllib content collectionIndy Library scanningFlightDeckReports Bot useless crawlerLinguee Bot useless crawler

The above is the details of Nginx anti-crawler strategy to prevent UA from crawling the website. For more information about Nginx anti-crawler, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • SpringBoot+webMagic implements website crawler example code
  • Springboot+webmagic implements java crawler jdbc and mysql method
  • Python crawler crawls Taobao product price comparison (with Taobao anti-crawler mechanism solution)
  • Summary of methods to bypass anti-crawler in Python
  • Detailed explanation of how to deal with Python anti-crawler knowledge points with cookies
  • Detailed explanation of 4 ways to bypass anti-crawler mechanisms in Selenium-webdriver
  • Website Anti-Crawler Strategy
  • Python common anti-crawler strategies
  • Selenium anti-crawler to skip the Taobao slider verification function implementation code
  • Python crawler and anti-crawler war
  • Python anti-crawler disguises the browser to crawl
  • How to use springboot anti-crawler component kk-anti-reptile

<<:  Web interview: The difference between MVC and MVVM and why Vue does not fully comply with MVVM

>>:  How to optimize MySQL index function based on Explain keyword

Recommend

How to use rem adaptation in Vue

1. Development environment vue 2. Computer system...

Detailed tutorial on deploying Springboot or Nginx using Kubernetes

1 Introduction After "Maven deploys Springbo...

How to solve the element movement caused by hover-generated border

Preface Sometimes when hover pseudo-class adds a ...

Docker MQTT installation and use tutorial

Introduction to MQTT MQTT (Message Queuing Teleme...

How to add indexes to MySQL

Here is a brief introduction to indexes: The purp...

How to build a new image based on an existing image in Docker

Building new images from existing images is done ...

How to copy MySQL table

Table of contents 1.mysqldump Execution process: ...

CSS3 timeline animation

Achieve results html <h2>CSS3 Timeline</...

Detailed steps for implementing timeout status monitoring in Apache FlinkCEP

CEP - Complex Event Processing. The payment has n...

Solve the problem of inconsistent front and back end ports of Vue

Vue front and back end ports are inconsistent In ...

Conditional comment style writing method and sample code

As front-end engineers, IE must be familiar to us...

Implementation steps for docker deployment of springboot and vue projects

Table of contents A. Docker deployment of springb...

Modify the boot time of grub in ubuntu

The online search to modify the grub startup time...

How to use CSS style to vertically center the font in the table

The method of using CSS style to vertically cente...