Retrieval activities in a database consisting of heterogeneous collections of structured text

The first part of this paper briefly describes a mathematical framework (called the containment model) that provides the operations and data structures for a text dominated database with a hierarchical structure. The database is considered to be a hierarchical collection of continuous extents each extent being a word, word phrase, text element or non-text element. The filter operations making up a search command are expressed in terms of containment criteria that specify whether a contiguous extent will be selected or rejected during a search. This formalism, comprised of the mathematical framework and its associated language, defines a conceptual layer upon which we can construct a well-defined higher level layer, specifically the user interface that serves to provide a level of functionality that is closer to the needs of the user and the application domain. With the conceptual layer established, we go on to describe the design and implementation of a versatile interface which handles queries that search and navigate a heterogeneous collection of structured documents. Interface functionality is provided by a set of “worker” modules supported by an “environment” that is the same for all interfaces. The interface environment allows a worker to communicate with the underlying text retrieval engine using a well-defined command protocol that is based on a small set of filter operators. The overall design emphasizes: a) interface flexibility for a variety of search and browsing capabilities, b) the modular independence of the interface with respect to its underlying retrieval engine, and c) the advantages to be accrued by defining retrieval commands using operators that are part of a text algebra that provides a sound theoretical foundation for the database.

[1]  William S. Cooper,et al.  Exploiting the maximum entropy principle to increase retrieval effectiveness , 1983, J. Am. Soc. Inf. Sci..

[2]  Darrell R. Raymond,et al.  Reading source code , 1991, CASCON.

[3]  Peter G. Anick,et al.  Addressing the requirements of a dynamic corporate textual information base , 1991, SIGIR '91.

[4]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[5]  Gaston H. Gonnet,et al.  Mind Your Grammar: a New Approach to Modelling Text , 1987, VLDB.

[6]  Donna Harman,et al.  Retrieving Records from a Gigabyte of Text on a Minicomputer Using Statistical Ranking. , 1990 .

[7]  Marc Gyssens,et al.  A grammar-based approach towards unifying hierarchical data models , 1989, SIGMOD '89.

[8]  W. Bruce Croft,et al.  Support for Browsing in an Intelligent Text Retrieval System , 1989, Int. J. Man Mach. Stud..

[9]  Ron Sacks-Davis,et al.  Using syntactic analysis in a document retrieval system that uses signature files , 1989, SIGIR '90.

[10]  Gerard Salton,et al.  An Evaluation of Text Matching Systems for Text Excerpts of Varying Scope , 1990 .

[11]  Gerard Salton,et al.  Automatic text structuring and retrieval-experiments in automatic encyclopedia searching , 1991, SIGIR '91.

[12]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[13]  James R. Driscoll,et al.  Incorporating a semantic analysis into a document retrieval strategy , 1991, SIGIR '91.

[14]  Jean Tague-Sutcliffe,et al.  Complete formal model for information retrieval systems , 1991, SIGIR '91.

[15]  Lisa F. Rau,et al.  Creating segmented databases from free text for text retrieval , 1991, SIGIR '91.

[16]  Heinz Ulrich Hoppe,et al.  EXPRESS: an experimental interface for factual information retrieval , 1989, SIGIR '90.

[17]  Forbes J. Burkowski,et al.  An Algebra for Hierarchically Organized Text-Dominate Databases , 1992, Inf. Process. Manag..

[18]  W. Bruce Croft,et al.  Experiments with query acquisition and use in document retrieval systems , 1989, SIGIR '90.

[19]  Carlo Meghini,et al.  Conceptual modeling of multimedia documents , 1991, Computer.

[20]  Peter Ingwersen,et al.  Integrated information retrieval in a knowledge worker support system , 1989, SIGIR '89.

[21]  Marc Gyssens,et al.  A grammar-based approach towards unifying hierarchical data models , 1989, SIGMOD '89.