Search+Engines

=The major search engines differ in several ways:=


 * size of index
 * search features supported (many search engines support the same features but require different syntx to initiate them)
 * how frequently the database is updated
 * ranking algorithms
 * how deeply each Web site is indexed

=How Search Engines Work=


 * in search engines, a computer program called a **spider** or **robot** gathers new documents from the WWW; the program retrieves hyperlinks that are attached to these documents, loads them into a **database**, and indexes them using a formula that differs from database to database
 * when you search the search engine, it searches the database looking for documents that contain the **keywords** you used in the **search expression**
 * no search engine actually indexes the entire Web
 * there is information that is inaccessible to search engines; this is referred to as the **invisible Web**; much of this content can be located in special databases
 * although robots have many different ways of collecting information from Web pages, the major search engines all claim to index most of the text of each Web document in their databases; this is called **full-text indexing**
 * in some search engines, the robot skips over words that appear often, such as prepositions and articles; these are known as **stop words**

=How a Spider Works=


 * spiders automatically do this gathering of information at different intervals which differs from service to service
 * some portions of a search engine's database may not get updated for a few weeks
 * some robots are intuitive; they know which words are important to the meaning of the entire Web page, and some of them can find synonyms to the words and add them to the index
 * some full-text databases use robots that enable them to search on concepts as well as on the search query words
 * some Web page authors include **meta-tags** as part of the HTML code in their pages; meta-tags contain keywords that describe the content and purpose of a Web page, but may not appear on the page; meta-tags allow Web pages that don't contain a lot of text to come up in a keyword search

=Search Features Common to Most Search Engines=


 * each search engine has its own way of interpreting and manipulating search expressions; in addition, many search engines have **default settings** that you may need to override if you want to obtain the most precise results
 * because a search can bring up so many Web pages, it is very easy to have a lot of hits with few that are relevant to your query; this is called **low precision/high recall**
 * search engines support many search features, though not all engines support each one; if they do support certain features, they may use different **syntax** in expressing the feature; before you use any of these search features, you need to check the search engines' help pages to see how the feature is expressed or if it is supported at all


 * Boolean Operators**


 * Implied Boolean Operators**


 * also known as pseudo-Boolean operators, are shortcuts to typing AND and NOT; in the search engines that support this feature, you type + before a word or phrase you want to include and - before a word or phrase that you want to exclude


 * Phrase Searching**


 * a **phrase** is a string of words that must appear next to each other; most search engines require double quotation marks to differentiate a phrase from words searched by themselves, i.e. "//global warming//" will find pages on global warming, while //global warming// will find you pages with the words //global// and //warming//


 * Proximity Searching**


 * **proximity operators** are words such as //near// or //within//; for example, you are trying to find information on the effects of chlorofluorocarbons on global warming; you might want to retrieve results that have the word //chlorofluorocarbons// very close to the phrase "global warming"; by placing the word NEAR or WITHIN between the two segments of the search expression, you would achieve more relevant results than if the words appeared in the same document but were perhaps pages apart.
 * some search tools that use this operator allow a **W/# of words** between the two segments to indicate how close the two words need to be, so a search phrase such as //Hillary W/2 Clinton// would look for the two words to occur not more than two words apart, allowing, for example, for both "Hillary Clinton" and "Hillary Rodham Clinton" to be returned in the search


 * Truncation**


 * **truncation** looks for multiple forms of a word, also known as **stemming**
 * for example, to research postmodern art, you might want to retrieve all the records that had the root word //postmodern//, such as //postmodernist// and //postmodernism//
 * most search engines support truncation by allowing you to place and asterisk (*) at the end of the root word
 * some search engines automatically truncate so that the search term postmodern would return postmodernist and postmodernism; in these cases, truncation is a default setting of the search engines


 * Wildcards**


 * using **wildcards** allows you to search for words that have most of the letters in common; for example, to search for both //woman// and //women//, instead of typing **woman OR women**, we place a wildcard character (most often an asterisk) to replace the fourth letter, like this **wom*n**
 * in addition to searching for both the American and British spellings of certain words, wildcards are also useful when searching for those words that are commonly misspelled; for example, take the word //genealogy//, by placing the wildcard character where the commonly mistaken letters are placed, like this: **gen*logy**, you can be sure to get documents with the word spelled correctly


 * Field Searching**


 * Web pages can be broken down into many parts; these parts, or **fields**, include titles, URLs, text, summaries or annotations (if present), and so forth
 * field searching is the ability to limit your search to certain fields
 * this ability to search by field can increase the relevance of the retrieved records
 * in Google for example, you can search for Web pages that contain certain words in the title of the page by typing **allintitle:obama afghanistan taliban**
 * you can also limit your search to a specific domain, such as educational institutions (.edu), commercial sites (.com), and so forth
 * in addition, a search can be limited to a particular host, such as a company of institution Web site


 * Language Searching**


 * the ability to limit results to a specific language can be useful
 * several search engines support this feature


 * Searching by File Format Type**


 * the ability to search for files of a particular file type can also be useful
 * for example, if you are looking for PowerPoint presentations, you would type in **environment filetype:ppt**
 * in Yahoo!, the search would look like this: **environment originurlextension:ppt**


 * Link Searching**


 * this feature allows you to search for sites that link to a particular URL
 * in Google and Yahoo! you type link: before the URL that you are searching for; for example, if you want to see all the Web sites that link to Wikipedia's main page, you would enter **link:en.wikipedia.org**


 * Limiting by Date**


 * some search engines allow you to search for pages that were added or modified between certain dates

=Output Features Common to Most Search Engines=


 * Results Ranking**


 * many search engines measure each Web page's relevance to your search query and arrange the search results from the most relevant to the least relevant; this is called **relevancy ranking**
 * each search engine has its own algorithm for determining relevance, but it usually involves counting how many times the words in your query appear in the Web pages
 * in some search engines, a document is considered more relevant if the words appear in certain fields, such as the title or summary field
 * in other search engines, relevance is determined by the number of times the keyword appears in a Web page divided by the total number of words in the page; this gives a percentage, and this page with the largest percentage appears first on the list
 * most search engines determine relevancy by how many Web pages link to it or how many people have accessed particular pages in response to similar questions in the past


 * Annotations or Summaries**


 * some search engines include short descriptive paragraphs of each Web page they return to you
 * these annotations of summaries can help you decide whether you should open a Web page, especially if there is not title for the Web page or if the title doesn't describe the page in detail


 * Results Per Page**


 * in some search engines, the **results per page** option allows you to choose how many results you want listed per page


 * Meta-tag Support**


 * some search engines acknowledge keywords that a Web page author has placed in the field in the HTML source document; this means that a document may be retrieved by a keyword search, even though the search expression may not appear in the document

=Search Tips=

If you feel your search has yielded too few Web pages (low recall), there are several things to consider:


 * Perhaps the search expression was too specific; go back and remove some terms that are connected by ANDs
 * Perhaps there are more terms to use. Think of more synonyms to OR together. Try truncating more words if possible.
 * Check spelling and syntax.
 * Read the instructions on the help page again

If your search has yielded too many results and many are unrelated to your topic (high recall/low precision), consider the following:


 * Narrow your search to specific fields if possible
 * Use more specific terms; for example, instead of //cancer//, use the specific type of cancer in which you are interested
 * Add additional terms with AND or NOT
 * Remove some synonyms if possible

=Search Engines: The Basics=

search engines differ from directories in that they are much larger, containing billions instead of a few million web sites

there is virtually no human selectivity involved in determining what webpages are included in the search engine's database

they are designed for searching rather than for browsing, so they provide more substantial searching capabilities than directories

no single search engine covers everything due to differences in crawling and indexing

and each search engine only finds a fraction of the web pages that exist for any topic

each engine includes webpages that others do not so when you use a different search engine, you are actually searching a slightly different range of the web

if you search a second, third and fourth engine, you will get different results as each has unique ranking algorithms

be aware of "sponsored links" on your results page

they are easily identifiable by being put off to the side of the page or surrounded by a blue background

capitalization is ignored

the order of words in your **query** may matter in terms of how Google ranks the results. Try placing your more important search terms at the beginning of your query



Typical Search Options

 * Phrase searching**

use quotation marks to search for exactly that combination of words and in that order, i.e. "John Lennon"


 * Title searching**

use **intitle**:antioxidants to search for the word you want in the title of the page, or summit intitle:nato to combine a word with a word you want in the title

use **allintitle:** to specify that all words after the colon be in the title but not specifically in that order, ex. allintitle nato preparedness


 * URL, Site, and Domain Searching**

internship **site:fbi.gov**

internship **site:baltimore.fbi.gov**

members **inurl:ala**


 * Link Searching**


 * link:**mla.org


 * Language Searching**

you can limit your retrieval to pages written in a given language


 * Searching by Date**

search engines may use the date when the page was last modified or the date on which the page was last crawled

it is often impossible to determine a definitive "date created" or the "date of publication" of the content of the page


 * Searching by File Type**


 * filetype:**pdf or **filetype:**doc

Searching for Related (Similar) Pages


 * related:**searchengineland.com


 * Searching by Other Prefixes**


 * cache:**biography.about.com


 * info:**cyndislist.com - will give you info about the site


 * stocks:**csco - enter a stock symbol after the colon to get stock quotes


 * define:**antidisestablishmentarianism - finds definitions of words on the internet


 * safesearch:**underwear - filters out adult content


 * inanchor:** the clickable text will contain that word


 * allinanchor:**extreme searcher

american pottery **numrange:**1900..1920 - use to find pages that contain a number that is within the range you specify


 * "Wildcard" Words**

"Franklin * Roosevelt" will search for records containing Franklin D. Roosevelt


 * Synonym Searches**

apples children ~nutrition - Google automatically recognizes and retrieves words that it considers "synonyms"


 * Search suggestions**

Google will offer you search suggestions as you type your query


 * Calculator**

you can use the Google search box for quick arithmetic calculation - use + - * / and ^ (for exponents)


 * Metric-Imperial Conversions**

i.e. 32 feet to meters, 30 km to miles, 8 liters to quarts, 68 ft to c (fahrenheit to celcius)

=Metasearch Engines=

let you search several search engines at once but they have some shortcomings

they may not cover every search engine such as Google, Yahoo!, Bing and Ask.com

most only return the first 10 or 20 records from each source

most search syntax does not work not allowing you to search by title or URL for example - some do not even recognize "phrase searching"

some present paid listings first

metasearch engines are not the solution for an exhaustive search - searching several different search engines at a time will give you better results

some metasearch engines:

[]

[]

[]

[]

[]


 * Specialty Search Engine:**

[]

search for magazine articles


 * Search Engine Shortcuts**

[]

=Search Engines: The Specifics=


 * Google**

search box - enter one or more words. Use a minus sign in front of a term to exclude that term (Boolean NOT). You can also use OR, as well as several prefixes such as intitle:.

Google will ignore small common words unless you insert a plus sign immediately in front of the (e.g., +the)

images - one of the largest image databases on the web

video - a search of video that appears in YouTube plus video from other sites

maps - maps, directions and yellow pages, driving directions, walking directions and directions for public transit and satellite images

news - covers 25,000 English-language news sources going back 30 days (plus a news archives search going back much further)

shopping - shopping database

gmail - free webmail service

click on more to find more services that google has to offer

books - a search of the full-text of millions of books, plus the ability to view actual pages from many

scholar - a search of scholarly literature from peer-reviewed journals, preprints, theses, books, etc.

advance search link

language tools - translate both your search terms and your search results into any of 42 languages

translate text or a webpage from one language to another for 42 languages

display the google interface in any one of 129 languages

link to the google country-specific versions for 174 countries

I'm feeling lucky - will take you directly to the first page in your results list instead of giving you a results list

about Google is Google's link to a range of Google's offerings and tools

search settings will allow you to block adult content, choose a language and interface preference, limit the number of results per page, option to have results open in new window

Advanced search - includes boxes to perform simple Boolean combinations, choice of how many results per page, choice for searching in a specific language, option to retrieve only a specific file format, box for limiting to a specific site or domain

click on "date, usage rights, numeric range, and more" to see more advanced search options


 * Sources:**

Barker, Donald I. and Carol D. Terry //Internet research// Boston, MA: Course Technology, 2009

Hartman, Karen and Ernest Ackerman (2010). Searching and researching on the Internet and the World Wide Web. Sherwood, OR: Franklin, Beedle & Associates.

Hock, Randolph //The extreme searcher's Internet handbook: a guide for the serious searcher// Medford, NJ: CyberAge Books, 2009