The Deep Web


No matter how good you are at using web search engines, there are valuable resources on the web that search engines will not find for you.

Knowing about the deep web is important because it contains a lot of tremendously useful information like databases of articles, data, statistics and government documents.

Search engines do not index certain web content mainly for the following reasons:

1. The search engine does not know about the page because no one has submitted the URL to the search engine.

2. The search engines have decided not to index the content because it is too deep in the site.

3. The search engine has been asked not to index the content by the presence of a robots.txt file on the site. (It's nobody's business).

4. The search engine does not have or use the technology required to index non-HTML content. This applies to files such as images, audio and video and a few other file types.

5. The search engine cannot get to the pages to index because it encounters a request for a password or the site has a search box that must be filled out in order to get to the content.

Sources:

Hock, Randolph The extreme searcher's internet handbook: a guide for the serious searcher. Medford, NJ: CyberAge Books, 2009.