The Accessibility Dimension for Structured Document Retrieval

Structured document retrieval aims at retrieving the document components that best satisfy a query, instead of merely retrieving pre-defined document units. This paper reports on an investigation of a tf-idf-acc approach, where tf and idf are the classical term frequency and inverse document frequency, and acc, a new parameter called accessibility, that captures the structure of documents. The tf-idf-acc approach is defined using a probabilistic relational algebra. To investigate the retrieval quality and estimate the acc values, we developed a method that automatically constructs diverse test collections of structured documents from a standard test collection, with which experiments were carried out. The analysis of the experiments provides estimates of the acc values.

[1]  Gloria Bordogna,et al.  Flexible Querying of Structured Documents , 2000, FQAS.

[2]  Stefano Mizzaro Relevance: the whole history , 1997 .

[3]  Yves Chiaramella,et al.  A Model for Multimedia Information Retrieval , 1996 .

[4]  Thomas Roelleke POOL: probabilistic object oriented logical representation and retrieval of complex objects: a model for hypermedia retrieval , 1999 .

[5]  Ellen M. Voorhees,et al.  The fifth text REtrieval conference (TREC-5) , 1997 .

[6]  Mounia Lalmas,et al.  Representing and retrieving structured documents using the Dempster-Shafer theory of evidence: modelling and evaluation , 1998, J. Documentation.

[7]  Donna K. Harman,et al.  Overview of the Fifth Text REtrieval Conference (TREC-5) , 1996, TREC.

[8]  M. de Rijke,et al.  Modal Logic , 2001, Cambridge Tracts in Theoretical Computer Science.

[9]  Lynda Hardman,et al.  'Lost in Hyperspace': Cognitive Mapping and Navigation in a Hypertext Environment , 1999, UK Hypertext.

[10]  Sung-Hyon Myaeng,et al.  A flexible model for retrieval of SGML documents , 1998, SIGIR '98.

[11]  Richard Spencer-Smith,et al.  Modal Logic , 2007 .

[12]  Mark E. Frisse Searching for Information in a Hypertext Medical Handbook , 1987, Hypertext.

[13]  Yves Chiaramella,et al.  Browsing and Querying: Two Complementary Approaches for Multimedia Information Retrieval , 1997, Hypertext, Information Retrieval, Multimedia.

[14]  Christoph Baumgarten,et al.  A probabilistic model for distributed information retrieval , 1997, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[15]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[16]  E. Frisse Mark,et al.  Searching for information in a hypertext medical handbook , 1988 .

[17]  Mounia Lalmas,et al.  Four-Valued Knowledge Augmentation for Structured Document Retrieval , 2003, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[18]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[19]  Mounia Lalmas,et al.  A Dempster-Shafer indexing for the focused retrieval of a hierarchically structured document space: Implementation and experiments on a web museum collection , 2000, RIAO.