A Dempster-Shafer indexing for the focused retrieval of a hierarchically structured document space: Implementation and experiments on a web museum collection

Effective retrieval of hierarchically structured web documents should exploit the content and structural knowledge associated with the documents. This knowledge can be used to retrieve optimal documents: documents that contain relevant information, and from which users can browse, using the links in these documents, to retrieve further relevant documents. We refer to this approach as focussed retrieval. This paper investigates the effectiveness of a model for the focussed retrieval of hierarchically structured web documents based on the Dempster-Shafer theory of evidence. To allow for focussed retrieval, the representation of a document is defined as the aggregation of the representation of its own content and that of its child documents. To evaluate the model, we constructed a test collection based on a museum web site. From our experiments on this collection, the results show that the Dempster-Shafer theory, in particular, the aggregation, leads to an effective focussed retrieval of hierarchically structured web documents.

[1]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[2]  J. Chadwick A survey of characteristics and patterns of behavior in visitors to a museum web site , 1998 .

[3]  M. Lalmas,et al.  A dempster-shafer indeing for structured document retrieval: implementation and experiments on a web museum collection , 1999 .

[4]  Paul B. Kantor,et al.  A study of information seeking and retrieving. I. background and methodology , 1988 .

[5]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[6]  Jean Tague-Sutcliffe,et al.  The Pragmatics of Information Retrieval Experimentation Revisited , 1997, Inf. Process. Manag..

[7]  Yves Chiaramella,et al.  Browsing and Querying: Two Complementary Approaches for Multimedia Information Retrieval , 1997, Hypertext, Information Retrieval, Multimedia.

[8]  Mounia Lalmas,et al.  A probabilistic description-oriented approach for categorizing web documents , 1999, CIKM '99.

[9]  Mark D. Dunlop,et al.  Hypermedia and Free Text Retrieval , 1993, Inf. Process. Manag..

[10]  Donna K. Harman,et al.  The TREC Conferences , 1997, HIM.

[11]  W. Bruce Croft,et al.  Support for Browsing in an Intelligent Text Retrieval System , 1989, Int. J. Man Mach. Stud..

[12]  C. J. van Rijsbergen,et al.  Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .

[13]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[14]  Stefano Mizzaro,et al.  How many relevances in information retrieval? , 1998, Interact. Comput..

[15]  Mounia Lalmas,et al.  A Model for Representing and Retrieving Heterogeneous Structured Documents Based on Evidential Reasoning , 1999, Comput. J..

[16]  Ellen M. Voorhees,et al.  Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.

[17]  John Robertson,et al.  Hypermedia Authoring , 1995, IEEE Multim..

[18]  Mounia Lalmas,et al.  Dempster-Shafer's theory of evidence applied to structured documents: modelling uncertainty , 1997, SIGIR '97.

[19]  John D. Lowrance,et al.  Understanding evidential reasoning , 1992, Int. J. Approx. Reason..

[20]  Mounia Lalmas,et al.  Representing and retrieving structured documents using the Dempster-Shafer theory of evidence: modelling and evaluation , 1998, J. Documentation.

[21]  E. Frisse Mark,et al.  Searching for information in a hypertext medical handbook , 1988 .