Automatic Structuring of Text Files

SUMMARY In many practical information retrieval situations, it is necessary to process heterogeneous text databases that vary greatly in scope and coverage, and deal with many different subjects. In such an environment it is important to provide flexible access to individual text pieces, and to structure the collection so that related text elements are identified and appropriately linked. Methods are described in this study for the automatic structuring of heterogeneous text collections, and the construction of browsing tools and access procedures that facilitate collection use. The proposed methods are illustrated by performing searches with a large automated encyclopedia.

[1]  Gerard Salton,et al.  A theory of indexing , 1975, Regional conference series in applied mathematics.

[2]  Ben Shneiderman,et al.  A Spectrum of Automatic Hypertext Constructions , 1989, Hypermedia.

[3]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[4]  R. Raymond Darrell,et al.  Hypertext and the Oxford English dictionary , 1988 .

[5]  Ben Shneiderman,et al.  Automatically transforming regularly structured linear documents into hypertext , 1989 .

[6]  Alan Borning,et al.  A prototype electronic encyclopedia , 1985, TOIS.

[7]  Peter Willett,et al.  Paragraph-based Searching in Full-Text Documents , 1988, Electron. Publ..

[8]  Gerard Salton,et al.  Text Linking and Retrieval Experiments for Textbook Components , 1990 .

[9]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[10]  Robert J. Glushko,et al.  Transforming text into hypertext for a compact disc encyclopedia , 1989, CHI '89.

[11]  Ben Shneiderman Reflections on authoring, editing, and managing hypertext , 1989 .

[12]  Robert J. Glushko,et al.  Design issues for multi-document hypertexts , 1989, Hypertext.

[13]  Gerard Salton,et al.  Automatic text structuring and retrieval-experiments in automatic encyclopedia searching , 1991, SIGIR '91.

[14]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[15]  Gerard Salton,et al.  Flexible Text Matching for Information Retrieval , 1990 .

[16]  Paul Kahn Linking Together Books: Experiments in Adapting Published Material into Intermedia Documents , 1989, Hypermedia.

[17]  Gary Marchionini,et al.  Finding facts vs. browsing knowledge in hypertext systems , 1988, Computer.

[18]  John O'Connor,et al.  Answer-passage retrieval by text searching , 1980, J. Am. Soc. Inf. Sci..

[19]  Gerard Salton,et al.  A Note on Term Weighting and Text Matching , 1990 .

[20]  Gerard Salton,et al.  An Evaluation of Text Matching Systems for Text Excerpts of Varying Scope , 1990 .

[21]  Jeff Conklin,et al.  Hypertext: An Introduction and Survey , 1987, Computer.

[22]  G Salton,et al.  Global Text Matching for Information Retrieval , 1991, Science.

[23]  Richard Furuta An Object-Based Taxonomy for Abstract Structure in Document Models , 1989, Comput. J..