A Model for Representing and Retrieving Heterogeneous Structured Documents Based on Evidential Reasoning

Documents often display an internal structure; they are composed of components. For example, a journal contains several articles, which themselves contain paragraphs, tables, etc. With structured documents, the retrievable units should be the document components as well as the whole document. The components of a structured document can be of different types: various media, located in a number of sites, or written in several languages. An information retrieval model for heterogeneous structured documents must take into account this disparity among document components. We present a model for representing and retrieving heterogeneous structured documents, that is multimedia, distributed and multilingual documents. The model is based on evidential reasoning, a formal theory that allows for the representation and the combination of knowledge. Here, knowledge is the content of document components. We show that the model provides for an appropriate representation and retrieval of heterogeneous structured documents.

[1]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[2]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[3]  Tuong Dao,et al.  An indexing model for structured documents to support queries on content, structure and attributes , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[4]  Jeff Conklin,et al.  Hypertext: An Introduction and Survey , 1987, Computer.

[5]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[6]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[7]  Christoph Baumgarten,et al.  A probabilistic model for distributed information retrieval , 1997, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[8]  Yves Chiaramella,et al.  A Model for Multimedia Information Retrieval , 1996 .

[9]  Forbes J. Burkowski Retrieval activities in a database consisting of heterogeneous collections of structured text , 1992, SIGIR '92.

[10]  Yves Chiaramella,et al.  Browsing and Querying: Two Complementary Approaches for Multimedia Information Retrieval , 1997, Hypertext, Information Retrieval, Multimedia.

[11]  Robert A. Hummel,et al.  On the Use of the Dempster Shafer Model in Information Indexing and Retrieval Applications , 1993, Int. J. Man Mach. Stud..

[12]  E. Frisse Mark,et al.  Searching for information in a hypertext medical handbook , 1988 .

[13]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[14]  Mounia Lalmas,et al.  A Model for Structured Document Retrieval: Empirical Investigations , 1997, HIM.

[15]  Ian A. Macleod,et al.  Storage and retrieval of structured documents , 1990, Inf. Process. Manag..

[16]  Mark E. Frisse,et al.  Searching for information in a hypertext medical handbook , 1987, Commun. ACM.

[17]  Ruy Luiz Milidiú,et al.  Belief Function Model for information retrieval , 1993 .

[18]  M. Lalmas,et al.  A dempster-shafer indeing for structured document retrieval: implementation and experiments on a web museum collection , 1999 .

[19]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[20]  Guido Moerkotte,et al.  Querying documents in object databases , 1997, International Journal on Digital Libraries.

[21]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[22]  Yves Chiaramella,et al.  An Integrated Model for Hypermedia and Information Retrieval , 1996 .

[23]  LalmasMounia Dempster-Shafer's theory of evidence applied to structured documents , 1997 .

[24]  Enrique H. Ruspini,et al.  Epistemic Logics, Probability, and the Calculus of Evidence , 1987, IJCAI.

[25]  Jeroen Groenendijk,et al.  Logic, language and meaning: Vol. I: Introduction to logic , 1991 .

[26]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[27]  Ricardo A. Baeza-Yates,et al.  A language for queries on structure and contents of textual databases , 1995, SIGIR '95.

[28]  Thomas Roelleke POOL: probabilistic object oriented logical representation and retrieval of complex objects: a model for hypermedia retrieval , 1999 .

[29]  Sung-Hyon Myaeng,et al.  A flexible model for retrieval of SGML documents , 1998, SIGIR '98.

[30]  Mounia Lalmas,et al.  Dempster-Shafer's theory of evidence applied to structured documents: modelling uncertainty , 1997, SIGIR '97.

[31]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[32]  Christian Plaunt,et al.  Subtopic structuring for full-length document access , 1993, SIGIR.

[33]  Ruy Luiz Milidiú,et al.  Belief Function Model for Information Retrieval , 1993, J. Am. Soc. Inf. Sci..

[34]  Jeroen Groenendijk,et al.  Logic, language and meaning: Vol. II: Intensional logic and logical grammar , 1991 .

[35]  E. Ruspini The Logical Foundations of Evidential Reasoning (revised) , 1987 .

[36]  Mounia Lalmas,et al.  Representing and retrieving structured documents using the Dempster-Shafer theory of evidence: modelling and evaluation , 1998, J. Documentation.

[37]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.