Web searching and information retrieval

The first Web information services were based on traditional information retrieval (IR) algorithms and techniques. However, IR algorithms were developed for smaller and more coherent collections than the Web is. Thus Web searching requires new techniques - exploiting linkage among Web pages or extensions of the old ones, for example. This article offers an overview of today's search engine architectures and techniques in the context of IR. The authors introduce three such architectures and describe their basic components. Then they discuss the most important feature of each Web search process: page importance and its use in retrieval. Some issues and challenges in Web search engines are also summarized as well as considerations on the future of Web searching in terms of the so-called semantic Web.

[1]  Gerhard Weikum,et al.  Adding Relevance to XML , 2000, WebDB.

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  Dan Klein,et al.  Evaluating strategies for similarity search on the web , 2002, WWW '02.

[4]  Ioana Manolescu,et al.  Integrating Keyword Search into XML Query Processing , 2000, BDA.

[5]  Michael K. Bergman White Paper: The Deep Web: Surfacing Hidden Value , 2001 .

[6]  Denilson Barbosa,et al.  The XML web: a first study , 2003, WWW '03.

[7]  N. Fuhr An Extension of XQL for Information Retrieval , 2000 .

[8]  Massimo Melucci,et al.  Information Retrieval on the Web , 2001, ESSIR.

[9]  Akhil Kumar,et al.  A dynamic warehouse for XML Data of the Web. , 2001 .

[10]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[11]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.

[12]  Nicholas Kushmerick,et al.  Expressive retrieval from XML documents , 2001, SIGIR '01.

[13]  Soumen Chakrabarti,et al.  Enhanced topic distillation using text, markup tags, and hyperlinks , 2001, SIGIR '01.

[14]  Luis Gravano,et al.  Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection , 2002, VLDB.

[15]  Mudhakar Srivatsa,et al.  Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web , 2003, Distributed Multimedia Information Retrieval.

[16]  B. Huberman,et al.  The Deep Web : Surfacing Hidden Value , 2000 .

[17]  José-Marie Griffiths Why the Web is not a Library , 1999 .

[18]  Karl Aberer,et al.  A Framework for Decentralized Ranking in Web Information Retrieval , 2003, APWeb.

[19]  Wallace Koehler Digital libraries and World Wide Web sites and page persistence , 1999, Inf. Res..