Added anti-crawler policy file: vim /usr/www/server/nginx/conf/anti_spider.conf File Contents #Disable crawling by tools such as Scrapy if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) { return 403; } #Disable access with specified UA or empty UAif ($http_user_agent ~ "WinHttp|WebZIP|FetchURL|node-superagent|java/|FeedDemon|Jullo|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|Java|Feedly|Apache-HttpAsyncClient|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|BOT/0.1|YandexBot|FlightDeckReports|Linguee Bot|^$" ) { return 403; } #Disable crawling by methods other than GET|HEAD|POST if ($request_method !~ ^(GET|HEAD|POST)$) { return 403; } #The command to block a single IP is #deny 123.45.6.7 #Block the entire segment from 123.0.0.1 to 123.255.255.254#deny 123.0.0.0/8 #Block the IP range from 123.45.0.1 to 123.45.255.254 #deny 124.45.0.0/16 #The command to block the IP range from 123.45.6.1 to 123.45.6.254 is #deny 123.45.6.0/24 # The following IPs are all rogue #deny 58.95.66.0/24; Configuration Usage Introduce in the site's server # Anti-crawler include /usr/www/server/nginx/conf/anti_spider.conf Finally restart nginx Verify whether it is valid Simulating YYSpider λ curl -X GET -I -A 'YYSpider' https://www.myong.top HTTP/1.1 200 Connection established HTTP/2 403 server: marco/2.11 date: Fri, 20 Mar 2020 08:48:50 GMT content-type: text/html content-length: 146 x-source: C/403 x-request-id: 3ed800d296a12ebcddc4d61c57500aa2 Simulate Baiduspider λ curl -X GET -I -A 'BaiduSpider' https://www.myong.top HTTP/1.1 200 Connection established HTTP/2 200 server: marco/2.11 date: Fri, 20 Mar 2020 08:49:47 GMT content-type: text/html vary: Accept-Encoding x-source: C/200 last-modified: Wed, 18 Mar 2020 13:16:50 GMT etag: "5e721f42-150ce" x-request-id: e82999a78b7d7ea2e9ff18b6f1f4cc84 Common User-Agents for Crawler FeedDemon content collection BOT/0.1 (BOT for JCE) sql injection CrawlDaddy sql injection Java content collection Jullo content collection Feedly content collection UniversalFeedParser content collection ApacheBench cc attacker Swiftbot useless crawler YandexBot useless crawler AhrefsBot useless crawler YisouSpider useless crawler (has been acquired by UC Shenma Search, this spider can be released!) jikeSpider useless crawlerMJ12bot useless crawlerZmEu phpmyadmin vulnerability scanningWinHttp collectioncc attackEasouSpider useless crawlerHttpClient tcp attackMicrosoft URL Control scanningYYSpider useless crawlerjaunty wordpress blasting scanneroBot useless crawlerPython-urllib content collectionIndy Library scanningFlightDeckReports Bot useless crawlerLinguee Bot useless crawler The above is the details of Nginx anti-crawler strategy to prevent UA from crawling the website. For more information about Nginx anti-crawler, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Web interview: The difference between MVC and MVVM and why Vue does not fully comply with MVVM
>>: How to optimize MySQL index function based on Explain keyword
Table of contents 1. Problem Description 2. Probl...
First, a common question is, what is the relation...
1. useState: Let functional components have state...
Virtualization and containerization are two inevi...
Table of contents 1. List interface display examp...
Create Group Grouping is established in the GROUP...
This article shares the specific code of native j...
1. Add a new user Only allow local IP access crea...
1. Vertical table and horizontal table Vertical t...
Many times, after we install a web service applic...
1. Get is used to obtain data from the server, wh...
Table of contents 1. Use scripts to encrypt TLS f...
Copy code The code is as follows: <!--[if !IE]...
Table of contents Overview CommonJS Specification...
In the project, it is necessary to obtain the lat...