A Model for the Representation and Focussed Retrieval of Structured Documents Based on Fuzzy Aggregation

Effective retrieval of structured documents should exploit the content and structural knowledge associated with the documents. This knowledge can be used to focus retrieval to the best entry points: document components that contain relevant information, and from which users can browse to retrieve further relevant components. To enable this, suitable representation methods must be developed. This paper presents a model for representing structured documents to allow for their focussed retrieval. The model is founded on fuzzy aggregation, an approach based on the fuzzy representation of linguistic quantifiers and ordered weighted averaging operators. By defining the representation of a document component as the fuzzy aggregation of its related components, we arrive at a document representation that supports the selection of best entry points.

[1]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[2]  L. Zadeh A COMPUTATIONAL APPROACH TO FUZZY QUANTIFIERS IN NATURAL LANGUAGES , 1983 .

[3]  Thomas Roelleke POOL: probabilistic object oriented logical representation and retrieval of complex objects: a model for hypermedia retrieval , 1999 .

[4]  Christoph Baumgarten,et al.  A probabilistic model for distributed information retrieval , 1997, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[5]  Gloria Bordogna,et al.  Flexible Querying of Structured Documents , 2000, FQAS.

[6]  Ricardo A. Baeza-Yates,et al.  A language for queries on structure and contents of textual databases , 1995, SIGIR '95.

[7]  Jean-Pierre Chevallet,et al.  Toward a Structured Information Retrieval System on the Web: Automatic Structure Extraction of Web Pages , 2001, WebDyn@ICDT.

[8]  Forbes J. Burkowski Retrieval activities in a database consisting of heterogeneous collections of structured text , 1992, SIGIR '92.

[9]  Sung-Hyon Myaeng,et al.  A flexible model for retrieval of SGML documents , 1998, SIGIR '98.

[10]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.

[11]  Berthier A. Ribeiro-Neto,et al.  Link-based and content-based evidential information in a belief network model , 2000, SIGIR '00.

[12]  Ronald R. Yager,et al.  Quantifier guided aggregation using OWA operators , 1996, Int. J. Intell. Syst..

[13]  Yves Chiaramella,et al.  Browsing and Querying: Two Complementary Approaches for Multimedia Information Retrieval , 1997, Hypertext, Information Retrieval, Multimedia.

[14]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[15]  Dik Lun Lee,et al.  Search and ranking algorithms for locating resources on the World Wide Web , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[16]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[17]  Ronald R. Yager A Framework for Linguistic and Hierarchical Queries in Document Retrieval , 2000 .

[18]  Mounia Lalmas,et al.  A Dempster-Shafer indexing for the focused retrieval of a hierarchically structured document space: Implementation and experiments on a web museum collection , 2000, RIAO.

[19]  Mounia Lalmas,et al.  Dempster-Shafer's theory of evidence applied to structured documents: modelling uncertainty , 1997, SIGIR '97.

[20]  Ian A. Macleod,et al.  Storage and retrieval of structured documents , 1990, Inf. Process. Manag..

[21]  Mark E. Frisse,et al.  Searching for information in a hypertext medical handbook , 1987, Commun. ACM.

[22]  Massimo Marchiori,et al.  The Quest for Correct Information on the Web: Hyper Search Engines , 1997, Comput. Networks.

[23]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[24]  Yves Chiaramella,et al.  A Model for Multimedia Information Retrieval , 1996 .

[25]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[26]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.