Usually the goal of building a website is to have it indexed by search engines and expand its promotion. But if your website involves personal privacy or confidential non-public web pages and you need to prohibit search engines from indexing and crawling it, what should you do? For example, Taobao is an example of a website that is prohibited from being indexed by search engines. This article will teach you several ways to block or prohibit search engines from indexing and crawling website content. Search engine spiders are constantly crawling the Internet. If our website does not take any actions to prohibit search engines from indexing it, it will easily be indexed by search engines. So here's how to prevent search engines from indexing website content. First, the robots.txt method Search engines comply with the robots.txt protocol by default (not excluding some rogue engines). Create a robots.txt text file and put it in the root directory of the website. Edit the code as follows:
With the above code, you can tell search engines not to crawl and index this website. Be careful when using the above code: this will prohibit all search engines from accessing any part of the website. If you only prohibit Baidu search engine from indexing and crawling web pages 1. Edit the robots.txt file and design the markup as:
The above robots file will prohibit all crawling from Baidu. Let’s talk about Baidu’s user-agent here. What is Baiduspider’s user-agent? Baidu uses different user-agents for various products:
You can set different crawling rules based on the different user-agents of each product. The following robots implementation prohibits all crawling from Baidu but allows image search to crawl the /image/ directory:
Please note: the web pages crawled by Baiduspider-cpro and Baiduspider-ads will not be indexed, they are just executing the operations agreed with the customer, so they do not comply with the robots protocol. This can only be resolved by contacting Baidu. How to only prohibit Google search engine from indexing and crawling web pages? The method is as follows: Edit the robots.txt file and mark it as:
Second, web page code method Add the code <meta name="robots" content="noarchive"> between the <head> and </head> of the website's homepage code. This tag prohibits search engines from crawling the website and displaying web page snapshots. Add <meta name="Baiduspider" content="noarchive"> between the <head> and </head> codes on the homepage of the website to prevent Baidu search engine from crawling the website and displaying web page snapshots. Add <meta name="googlebot" content="noarchive"> between the <head> and </head> codes on the homepage of the website to prevent Google search engine from crawling the website and displaying web page snapshots. In addition, when our needs are very strange, such as the following situations: 1. The website has added robots.txt, can it still be found in Baidu search? Because it takes time to update the search engine index database. Although Baiduspider has stopped accessing the web pages on your website, it may take several months to clear the web page index information that has been established in the Baidu search engine database. Please also check whether your robots configuration is correct. If your need to refuse to be included is very urgent, you can also submit a request through the complaint platform. 2. I want my website content to be indexed by Baidu but not saved as snapshots. What should I do? Baiduspider complies with the Internet meta robots protocol. You can use the meta settings of a web page to have Baidu only index that page, but not display a snapshot of that page in the search results. Just like updating robots, it takes time to update the search engine index database. So even if you have prohibited Baidu from displaying snapshots of the page in search results through meta in the web page, if the web page index information has already been established in the Baidu search engine database, it may take two to four weeks for the update to take effect online. 3. If you want to be indexed by Baidu but do not want to save website snapshots, the following code can solve the problem: 4. If you want to prohibit all search engines from saving snapshots of your web pages, the code is as follows: Here are some commonly used code combinations:
Summarize The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. If you want to learn more about this, please check out the following links You may also be interested in:
|
<<: How to use Baidu Map API in vue project
>>: How to insert batch data into MySQL database under Node.js
Preface After a long time of reading various mate...
Today is another very practical case. Just hearin...
mysql between boundary range The range of between...
1. Design source code Copy code The code is as fol...
Table of contents What is an index? Leftmost pref...
If your MySQL database is installed on a centos7 ...
Preface It took two days to reconstruct a puzzle ...
Today I will introduce how to enable the Linux su...
The differences among execute, executeUpdate, and...
In the process of product design, designers always...
This article introduces the flex layout to achiev...
Table of contents 1.union: You can add query resu...
MySQL green version setting code, and 1067 error ...
The fastest way to experience the latest version ...
Table of contents 1. Repeated declaration 1.1 var...