Every website usually encounters many non-search engine crawlers. Most of these crawlers are used for content collection or are written by beginners. Unlike search engine crawlers, they have no frequency control and often consume a lot of server resources, resulting in a waste of bandwidth. In fact, Nginx can easily filter requests based on User-Agent. We only need to use a simple regular expression at the required URL entry position to filter out crawler requests that do not meet the requirements: location / { if ($http_user_agent ~* "python|curl|java|wget|httpclient|okhttp") { return 503; } # Other normal configuration... } Note: The variable Blocking web crawlers in Nginx server { listen 80; server_name www.xxx.com; #charset koi8-r; #access_log logs/host.access.log main; #location / { #root html; # index index.html index.htm; #} if ($http_user_agent ~* "qihoobot|Baiduspider|Googlebot|Googlebot-Mobile|Googlebot-Image|Mediapartners-Google|Adsbot-Google|Feedfetcher-Google|Yahoo! Slurp|Yahoo! Slurp China|YoudaoBot|Sosospider|Sogou spider|Sogou web spider|MSNBot|ia_archiver|Tomato Bot") { return 403; } location ~ ^/(.*)$ { proxy_pass http://localhost:8080; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; client_max_body_size 10m; client_body_buffer_size 128k; proxy_connect_timeout 90; proxy_send_timeout 90; proxy_read_timeout 90; proxy_buffer_size 4k; proxy_buffers 4 32k; proxy_busy_buffers_size 64k; proxy_temp_file_write_size 64k; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # #location ~ \.php$ { #root html; # fastcgi_pass 127.0.0.1:9000; # fastcgi_index index.php; # fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name; #include fastcgi_params; #} # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # #location ~ /\.ht { # deny all; #} } You can test it with curl curl -I -A "qihoobot" www.xxx.com Summarize The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. If you want to learn more about this, please check out the following links You may also be interested in:
|
>>: A brief discussion on the alternative method of $refs in vue2 in vue3 combined API
This article shares the specific process of the j...
This article uses a specific example to introduce...
Table of contents linux 1. What is SWAP 2. What d...
Regarding uninstalling the previously installed v...
1. Install Baidu Eslint Rule plugin npm i -D esli...
Implementation ideas: First of all, the alarm inf...
You can write a function: Mainly use regular expr...
Table of contents What is the Picker component Pr...
I believe that everyone needs to copy and paste d...
This article mainly introduces the differences be...
Many times when learning web page development, th...
How to write join If you use left join, is the ta...
When the user's home directory becomes larger...
System performance expert Brendan D. Gregg update...
HTML is made up of tags and attributes, which are...