Subtopic structuring for full-length document access

We argue that the advent of large volumes of full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure on full-length text documents; that is, a partition of the text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval task than does a standard IR measure.

[1]  G Salton,et al.  Global Text Matching for Information Retrieval , 1991, Science.

[2]  Marti A. Hearst Cases as Structured Indexes for Full-Length Documents , 1993 .

[3]  Marti A. Hearst Text tiling: A quantitative approach to discourse segmentation , 1993, ACL 1993.

[4]  Jung Soon Ro,et al.  An evaluation of the applicability of ranking algorithms to improving the effectiveness of full text retrieval , 1985 .

[5]  Jung Soon Ro An evaluation of the applicability of ranking algorithms to improve the effectiveness of full‐text retrieval. I. On the effectiveness of full‐text retrieval , 1988 .

[6]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[7]  Udo Hahn,et al.  Topic parsing: Accounting for text macro structures in full-text analysis , 1990, Inf. Process. Manag..

[8]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[9]  Elizabeth Du,et al.  The discourse-level structure of empirical abstracts: an exploratory study , 1991, Inf. Process. Manag..

[10]  Jung Soon Ro An evaluation of the applicability of ranking algorithms to improve the effectiveness of full‐text retrieval. II. On the effectiveness of ranking algorithms on full‐text retrieval , 1988 .

[11]  Marti A. Hearst TextTiling: A Quantitative Approach to Discourse , 1993 .

[12]  Gerald Salton,et al.  Automatic text processing , 1988 .

[13]  Jung Soon Ro An evaluation of the applicability of ranking algorithms to improve the effectiveness of full-text retrieval. I. On the effectiveness of full-text retrieval , 1988, J. Am. Soc. Inf. Sci..

[14]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[15]  Jung Soon Ro,et al.  An evaluation of the applicability of ranking algorithms to improve the effectiveness of full-text retrieval. II. On the effectiveness of ranking algorithms on full-text retrieval , 1988, J. Am. Soc. Inf. Sci..

[16]  Gerard Salton,et al.  Automatic text structuring and retrieval-experiments in automatic encyclopedia searching , 1991, SIGIR '91.

[17]  David L. Waltz,et al.  Statistical methods, artificial intelligence, and information retrieval , 1992 .

[18]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[19]  W. Bruce Croft,et al.  Interactive retrieval of complex documents , 1990, Inf. Process. Manag..