论文信息 - A Modern Approach to Searching the World Wide Web: Ranking Pages by Inference over Content

A Modern Approach to Searching the World Wide Web: Ranking Pages by Inference over Content

The Hypertext-based Webs such as Intranets contain a vast amount of information pertaining to an enormous number of subjects. It is, however, an organically grown and thus essentially structureless environment that is in a constant state of flux. Therefore, finding useful information pertaining to a particular topic is oftentimes a difficult task. Search engines were designed with the intent of easing the burden on the individuals perusing the Web for specific topics. Traditionally, Web search engines have used straightforward–and relatively naïve–approaches towards indexing and ranking pages pertaining to a particular subject. As our understanding of hyperlinked environments has improved, algorithmic tools have been developed that more effectively distill the plethora of information that exists within this environment. We will briefly discuss the history of the World Wide Web, the approaches employed by “traditional” search engines, and how alternative techniques can improve upon older approaches. We find that new techniques build upon, rather than replace, previous approaches, and that the problem of searching the Web is one that evolves as our understanding of the Web’s structure improves.

Edgar R. Weippl | Werner Winiwarter | Bronson Trevor

[1] Koichi Takeda,et al. Information retrieval on the web , 2000, CSUR.

[2] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[3] Dell Zhang,et al. An efficient algorithm to rank Web resources , 2000, Comput. Networks.

[4] Giles,et al. Searching the world wide Web , 1998, Science.

[5] C. Lee Giles,et al. Accessibility of information on the web , 1999, Nature.

[6] Gerard Salton,et al. The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[7] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.