One of the enabling technologies of the World Wide Web, along with browsers, domain name servers, and hypertext markup language, is the search engine. Although the Web contains over 100 million pages of information, those millions of pages are useless if you cannot find the pages you need. All major Web search engines operate the same way: a gathering program explores the hyperlinked documents of the Web, foraging for Web pages to index. These pages are stockpiled by storing them in some kind of database or repository. Finally, a retrieval program takes a user query and creates a list of links to Web documents matching the words, phrases, or concepts in the query. Although the retrieval program itself is correctly called a search engine, by popular usage the term now means a database combined with a retrieval program. For example, the Lycos search engine comprises the Lycos Catalog of the Internet and the Pursuit retrieval program. This paper describes the Lycos system for collecting, storing, and retrieving information about pages on the Web. After outlining the history and precursors of the Lycos system, the paper discusses some of the design choices made in building this Web indexer and touches briefly on the economic issues involved in working with very large retrieval systems.
[1]
Gerard Salton,et al.
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
,
1989
.
[2]
Lisa F. Rau,et al.
GE-CMU: description of the SHOGUN system used for MUC-5
,
1993,
MUC.
[3]
David Eichmann,et al.
The RBSE spider — Balancing effective search against Web load
,
1994,
WWW Spring 1994.
[4]
Oliver A. McBryan,et al.
GENVL and WWWW: Tools for taming the Web
,
1994,
WWW Spring 1994.
[5]
B. Pinkerton,et al.
Finding What People Want : Experiences with the WebCrawler
,
1994,
WWW Spring 1994.
[6]
Luis Gravano,et al.
The effectiveness of GIOSS for the text database discovery problem
,
1994,
SIGMOD '94.
[7]
Martijn Koster,et al.
ALIWEB - Archie-like Indexing in the WEB
,
1994,
Comput. Networks ISDN Syst..
[8]
Andrzej Duda,et al.
Discover: A Resource Discovery System Based on Content Routing
,
1995,
Comput. Networks ISDN Syst..
[9]
Oren Etzioni,et al.
Multi-Engine Search and Comparison Using the MetaCrawler
,
1995,
World Wide Web J..