Create a new configuration file (for example, go to the conf directory under the nginx installation directory and create: agent_deny.conf) Disable crawling by tools such as Scrapy if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) { return 403; } Prohibit access with specified UA or empty UA #forbidden Scrapy if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) { return 403; } #forbidden UA if ($http_user_agent ~ "Bytespider|FeedDemon|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|YisouSpider|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|^$" ) { return 403; } #forbidden not GET|HEAD|POST method access if ($request_method !~ ^(GET|HEAD|POST)$) { return 403; } Then, insert the following code into the server section of the website configuration: include agent_deny.conf; Restart nginx: /data/nginx/sbin/nginx -s reload The test can be done by using curl -A to simulate crawling, for example: curl -I -A 'YYSpider' <<www.xxx.con>> result
Simulate a crawl with empty UA: curl -I -A' ' <<www.xxx.cn>> result
Simulate the crawling of Baidu spider: curl -I -A 'Baiduspider' <<<www.xxx.cn>>>
UA Type FeedDemon content collection BOT/0.1 (BOT for JCE) sql injection CrawlDaddy sql injection Java content collection Jullo content collection Feedly content collection UniversalFeedParser content collection ApacheBench cc attacker Swiftbot useless crawler YandexBot useless crawler AhrefsBot useless crawler YisouSpider useless crawler (has been acquired by UC Shenma Search, this spider can be released!) jikeSpider useless crawlerMJ12bot useless crawlerZmEu phpmyadmin vulnerability scanningWinHttp collectioncc attackEasouSpider useless crawlerHttpClient tcp attackMicrosoft URL Control scanningYYSpider useless crawlerjaunty wordpress blasting scanneroBot useless crawlerPython-urllib content collectionIndy Library scanningFlightDeckReports Bot useless crawlerLinguee Bot useless crawler Nginx anti-hotlink configuration Background: To prevent third-party reference links from accessing our images and consuming server resources and network traffic, we can do anti-hotlink restrictions on the server. Refer method to achieve anti-hotlinking Working module: ngx_http_referer_module. Valid variables: $invalid_referer, global variable. Configuration domain: server, location Configuration: server { listen 80; server_name www.imcati.com refer-test.imcati.com; root /usr/share/nginx/html; location ~*\.(gif|jpg|jpeg|png|bmp|swf)$ { valid_referers none blocked www.imcati.com; if ($invalid_referer) { return 403; } } }
This is the end of this article about the detailed configuration of nginx anti-hotlink and anti-crawler. For more relevant nginx anti-hotlink and anti-crawler configuration content, please search for previous articles on 123WORDPRESS.COM or continue to browse the related articles below. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Vue uses ECharts to implement line charts and pie charts
>>: Detailed explanation of storage engine in MySQL
http return code list (below is an overview) for ...
As shown in the figure: Check port usage: sudo ne...
The HTML code for intercepting text beyond multipl...
Conventional solution Use FileReader to read the ...
Table of contents 1. Direct assignment 2. Shallow...
For historical reasons, MySQL replication is base...
There are many import methods on the Internet, an...
Table of contents 1. Basics 2. Nodes, trees, and ...
The specific code for encapsulating the image cap...
1. Qualitative changes brought by CSS variables T...
1. From father to son Define the props field in t...
Tomcat CentOS Installation This installation tuto...
Install Required Files Yum install openssl-* -y C...
Table of contents 1. Isolation Level READ UNCOMMI...
Overview This article begins to introduce content...