Added anti-crawler policy file: vim /usr/www/server/nginx/conf/anti_spider.conf File Contents #Disable crawling by tools such as Scrapy if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) { return 403; } #Disable access with specified UA or empty UAif ($http_user_agent ~ "WinHttp|WebZIP|FetchURL|node-superagent|java/|FeedDemon|Jullo|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|Java|Feedly|Apache-HttpAsyncClient|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|BOT/0.1|YandexBot|FlightDeckReports|Linguee Bot|^$" ) { return 403; } #Disable crawling by methods other than GET|HEAD|POST if ($request_method !~ ^(GET|HEAD|POST)$) { return 403; } #The command to block a single IP is #deny 123.45.6.7 #Block the entire segment from 123.0.0.1 to 123.255.255.254#deny 123.0.0.0/8 #Block the IP range from 123.45.0.1 to 123.45.255.254 #deny 124.45.0.0/16 #The command to block the IP range from 123.45.6.1 to 123.45.6.254 is #deny 123.45.6.0/24 # The following IPs are all rogue #deny 58.95.66.0/24; Configuration Usage Introduce in the site's server # Anti-crawler include /usr/www/server/nginx/conf/anti_spider.conf Finally restart nginx Verify whether it is valid Simulating YYSpider λ curl -X GET -I -A 'YYSpider' https://www.myong.top HTTP/1.1 200 Connection established HTTP/2 403 server: marco/2.11 date: Fri, 20 Mar 2020 08:48:50 GMT content-type: text/html content-length: 146 x-source: C/403 x-request-id: 3ed800d296a12ebcddc4d61c57500aa2 Simulate Baiduspider λ curl -X GET -I -A 'BaiduSpider' https://www.myong.top HTTP/1.1 200 Connection established HTTP/2 200 server: marco/2.11 date: Fri, 20 Mar 2020 08:49:47 GMT content-type: text/html vary: Accept-Encoding x-source: C/200 last-modified: Wed, 18 Mar 2020 13:16:50 GMT etag: "5e721f42-150ce" x-request-id: e82999a78b7d7ea2e9ff18b6f1f4cc84 Common User-Agents for Crawler FeedDemon content collection BOT/0.1 (BOT for JCE) sql injection CrawlDaddy sql injection Java content collection Jullo content collection Feedly content collection UniversalFeedParser content collection ApacheBench cc attacker Swiftbot useless crawler YandexBot useless crawler AhrefsBot useless crawler YisouSpider useless crawler (has been acquired by UC Shenma Search, this spider can be released!) jikeSpider useless crawlerMJ12bot useless crawlerZmEu phpmyadmin vulnerability scanningWinHttp collectioncc attackEasouSpider useless crawlerHttpClient tcp attackMicrosoft URL Control scanningYYSpider useless crawlerjaunty wordpress blasting scanneroBot useless crawlerPython-urllib content collectionIndy Library scanningFlightDeckReports Bot useless crawlerLinguee Bot useless crawler The above is the details of Nginx anti-crawler strategy to prevent UA from crawling the website. For more information about Nginx anti-crawler, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Web interview: The difference between MVC and MVVM and why Vue does not fully comply with MVVM
>>: How to optimize MySQL index function based on Explain keyword
1. Development environment vue 2. Computer system...
1 Introduction After "Maven deploys Springbo...
Preface Sometimes when hover pseudo-class adds a ...
Introduction to MQTT MQTT (Message Queuing Teleme...
Here is a brief introduction to indexes: The purp...
Building new images from existing images is done ...
Table of contents 1.mysqldump Execution process: ...
Preface The server used by the blogger was purcha...
Achieve results html <h2>CSS3 Timeline</...
CEP - Complex Event Processing. The payment has n...
Vue front and back end ports are inconsistent In ...
As front-end engineers, IE must be familiar to us...
Table of contents A. Docker deployment of springb...
The online search to modify the grub startup time...
The method of using CSS style to vertically cente...