Selective text utilization and text traversal

Abstract Many large collections of full-text documents are currently stored in machine-readable form and processed automatically in various ways. These collections may include different types of documents, such as messages, research articles, and books, and the subject matter may vary widely. To process such collections, robust text analysis methods must be used, capable of handling materials in arbitrary subject areas, and flexible access must be provided to texts and text excerpts of varying size. In this study, global text comparison methods are used to identify similarities between text elements, followed by local context-checking operations that resolve ambiguities and distinguish superficially similar texts from texts that actually cover identical topics. A linked text structure, known as a text relationship map, is then created that relates similar texts at various levels of detail. In particular, text links are available for full texts, as well as text sections, paragraphs, and sentence groups. The relationship graphs are usable as conceptualization tools to illustrate various text manipulation operations and may also serve as browsing maps in situations where searches or text traversal operations are conducted under user control. In this study, the relationship maps are used to identify important text passages, to traverse texts selectively both within particular documents and between documents, and to provide flexible text access to large text collections in response to various kinds of user needs. An automated 29-volume encyclopedia is used as an example to illustrate various possible text accessing and traversal operations. Implementation details are not included in this initial study.

[1]  P. Delany,et al.  Hypermedia and literary studies , 1991 .

[2]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[3]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[4]  John O'Connor,et al.  Answer-passage retrieval by text searching , 1980, J. Am. Soc. Inf. Sci..

[5]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[6]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[7]  Gerard Salton,et al.  Automatic text structuring and retrieval-experiments in automatic encyclopedia searching , 1991, SIGIR '91.

[8]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[9]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[10]  G Salton,et al.  Global Text Matching for Information Retrieval , 1991, Science.

[11]  James E. Rush,et al.  Automatic abstracting and indexing. II. Production of indicative abstracts by application of contextual inference and syntactic coherence criteria , 1971 .

[12]  Alan F. Smeaton,et al.  Information retrieval from hypertext using dynamically planned guided tours , 1993, ECHT '92.

[13]  Lois L. Earl,et al.  Experiments in automatic extracting and indexing , 1970, Inf. Storage Retr..

[14]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[15]  Mark Bernstein An Apprentice That Discovers Hypertext Links , 1990, ECHT.

[16]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[17]  Ben Shneiderman,et al.  Structural analysis of hypertexts: identifying hierarchies and useful metrics , 1992, TOIS.

[18]  Jay David Bolter,et al.  Architectures for volatile hypertext , 1991, HYPERTEXT '91.