InfoComp logo

Module One: Searching the web - Introduction

The amount of information on the internet continues to grow at an astronomical rate. Steve Lawrence and C. Lee Giles estimate in their recent Nature (July 8, 1999) research that the publicly indexable World Wide Web contains 800 million pages. More recently, that number has been estimated to top one billion pages (Inktomi Web Map). A search for a particular topic can reveal hundreds or thousands of options. Some of these options, generally web pages created by a subject matter expert, offer a wealth of data. However, most of the information relating to your chosen topic is tangential at best and often downright crummy. Therefore, the question remains: how can you separate the wheat from the chaff?

Assuming that your primary task is to search web pages for information, WWW search engines are helpful tools. However they do not all work the same way. Some search engines are hand indexed. In other words, a person or committee selects web sites to place in an index based on a particular set of criteria. Some search engines are automatically indexed. These sites use computer programs to search the WWW for sites; then they organize the site links into their unique indexing system. There are also "meta" search engines that do not organize sites; instead, they search other search engines! Focusing on the first two types of search engines, it's generally best to use an indexed site if your primary concern is finding "quality" pages -- but you're more likely to find a larger number of options by using automatically indexed sites.

The most popular hand-indexed site is Yahoo. Yahoo is organized in categories like Arts & Humanities and Business & Economy. Searching the site requires you to enter key words in the field located near the top of the screen. A recent search for the entry "San Jose State University" resulted in 120 "hits", or pages that included that phrase somewhere in their title. Benefits of Yahoo include ease of use (just type keywords) and relatively high quality sites. Limitations of Yahoo include the fact that very little of the web is catalogued on this hand-indexed page.

Some of the best automated search indexes are Hotbot and Altavista. Hotbot offers a great deal of flexibility in your searches, allowing you to look for key words, specific phrases, or web addresses. The latter option helps page designers know how many people have linked their pages to a particular site. Altavista is slightly less comprehensive -- searches generally offer fewer useful "hits" than Hotbot -- but you can phrase your searches in the form of questions such as "what is the population of Alaska?" It is usually a good idea to try two or three search engines before concluding your online search. Also, remember that various interest groups maintain searchable indexes of specific kinds of pages. For example, some search engines focus solely on pages about South Africa, while others concentrate on humor. Use Yahoo's list of search engines (select "search engines" in the Yahoo entry field to get this list) to narrow your list of options. If you want to learn more about search engines, check out a site maintained by the Kansas City Public Library