InfoComp logo

Module Four:
How does a search engine work?

Search engines may be broken down into three major components: the spider (or crawler or 'bot), the catalog (or index), and the sorter. The spider in an automated search engine that follows links from page to page searching for new sites. This is a continuous process that may be compared to painting the Golden Gate Bridge. Reach one "end" and the other needs a fresh coat. The key difference, of course, is that the Golden Gate Bridge isn't continually adding new lanes; the "information superhighway" is. The spider sends its information to the catalog where all of the new sites are organized and stored. At the same time, the spider updates the catalog in case older sites have ceased to exist and must be de-indexed. So far, virtually every automated engine works in a similar manner.

Where search engines gain their unique qualities is their sorters - proprietary software that sifts through the indexed sites and retrieves them when requested by users. Each search engine sorts its catalog of sites in a different way. Hand-indexed engines (also called directories) use a hierarchical system of increasing specialization. Thus, if you visit Yahoo, you might find a generic category such as "Social Science," but you can also find a specific category like "Political Science" within that category and an even more specialized set of links (like "International Relations") within that one! In contrast, automated search engines sort their catalogs of sites "on the fly" - in a unique way according to the format of your query.

Each search engine maintains its special set of priorities for ranking the sites you see. Each one focuses on the words in the title section of the webpage and the number of words in the body of the site that relate to your request. However, some engines like Google prioritize pages with many links them. Therefore, your "hits" are likely to be sites that other folks find useful as well. Some engines, like GoTo offer a hybrid of automated and hand-indexed results; but the first hits you'll generally receive have been placed there by paying customers. Some of the most intriguing search engines like Ask Jeeves employ sophisticated programming to make sense out of questions without requiring you to employ certain forms of syntax (described in the module: "How can I sharpen my searches?")

Activity: develop a list of the three categories in web searching that are most important to you. Some criteria might include size, popularity, clustering, and frequency of updating. Visit Danny Sullivan's Directory of Search Engines and select the engine that most closely meets your priorities.