Best Entry Pages for the Topic Distillation Task

In a typical web search, users consider entry pages to relevant sites as more valuable than isolated pieces of relevant text. The Topic Distillation Task aims at identifying the page at the right level of site hierarchy considered to provide optimal access, by browsing, to relevant pages within the site, i.e. its Best Entry Page. Our aim is to estimate a measure of how good a page is as an entry page to the site it belongs, by aggregating the page’s system-assessed relevance with that of its structurally related, Web pages belonging to the same site. To model this aggregation, we propose a framework which is expressed within DempsterShafer Theory of Evidence. Furthermore,we generalise our model by taking into account other system-assessed properties of Web pages. Apart from their relevance, the authority and hub properties of Web pages are considered in the aggregation. We evaluate our approach by performing experiments using the .GOV test collection. The results of these experiments are promising.

[1]  David Hawking,et al.  Very Large Scale Retrieval and Web Search (Preprint version) , 2004 .

[2]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[3]  Fabio Crestani,et al.  Application of Spreading Activation Techniques in Information Retrieval , 1997, Artificial Intelligence Review.

[4]  Mounia Lalmas,et al.  A Dempster-Shafer indexing for the focused retrieval of a hierarchically structured document space: Implementation and experiments on a web museum collection , 2000, RIAO.

[5]  Mounia Lalmas,et al.  Dempster-Shafer's theory of evidence applied to structured documents: modelling uncertainty , 1997, SIGIR '97.

[6]  David Hawking,et al.  Overview of the TREC-2002 Web Track , 2002, TREC.

[7]  Luis M. de Campos,et al.  Ranking Structured Documents Using Utility Theory in the Bayesian Network Retrieval Model , 2003, SPIRE.

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Allan Borodin,et al.  Link analysis ranking , 2004 .

[10]  Iadh Ounis,et al.  A utility-oriented hyperlink analysis model for the Web , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).

[11]  Gabriella Kazai,et al.  The Accessibility Dimension for Structured Document Retrieval , 2002, ECIR.

[12]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[13]  Hans-Peter Frei,et al.  Making use of hypertext links when retrieving information , 1992, ECHT '92.

[14]  Pia Borlund,et al.  The concept of relevance in IR , 2003, J. Assoc. Inf. Sci. Technol..

[15]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[16]  Massimo Marchiori,et al.  The Quest for Correct Information on the Web: Hyper Search Engines , 1997, Comput. Networks.

[17]  Iadh Ounis,et al.  Usefulness of hyperlink structure for query-biased topic distillation , 2004, SIGIR '04.

[18]  Yves Chiaramella,et al.  A Model for Multimedia Information Retrieval , 1996 .

[19]  Arthur P. Dempster,et al.  A Generalization of Bayesian Inference , 1968, Classic Works of the Dempster-Shafer Theory of Belief Functions.