Cases as Structured Indexes for Full-Length Documents

Two long, full-length texts are not likely to discuss all, or almost all, of the same subtopics or subpoints. Even if the documents contain many of the same terms, the ways the terms are grouped to form subtopical discussions still might be quite different. A solution is to create a description of a document which lists all of its subtopical discussions as well as its main topics. An index that indicates this structure is an abstract representation of the document, and we can think of this index as a case in the CaseBased Reasoning (CBR) sense. This paper proposes the use of cases to represent the high-level structure of full-length documents for the purpose of information retrieval. The cases are to be used both for assessing document similarity and for helping the user construct viable queries. The case can be transformed in various ways in order to make it more similar to the descriptions of other documents; these transformations include generalizing, substituting, and emphasizing subtopic descriptions. An advantage of this approach is that the cases that represent the document are automatically generable.

[1]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[2]  Kevin D. Ashley Modeling legal argument - reasoning with cases and hypotheticals , 1991, Artificial intelligence and legal reasoning.

[3]  Marti A. Hearst TextTiling: A Quantitative Approach to Discourse , 1993 .

[4]  G Salton,et al.  Global Text Matching for Information Retrieval , 1991, Science.

[5]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[6]  Beth Sundheim,et al.  A Performance Evaluation of Text-Analysis Technologies , 1991, AI Mag..

[7]  Christian Plaunt,et al.  Subtopic structuring for full-length document access , 1993, SIGIR.

[8]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[9]  W. Bruce Croft,et al.  Interactive retrieval of complex documents , 1990, Inf. Process. Manag..

[10]  W. Bruce Croft,et al.  Text retrieval and inference , 1992 .

[11]  Gerard Salton,et al.  Automatic text structuring and retrieval-experiments in automatic encyclopedia searching , 1991, SIGIR '91.

[12]  Elizabeth Du,et al.  The discourse-level structure of empirical abstracts: an exploratory study , 1991, Inf. Process. Manag..

[13]  Janet L. Kolodner,et al.  Maintaining Organization in a Dynamic Long-Term Memory , 1983, Cogn. Sci..

[14]  Mark H. Chignell,et al.  Knowledge-based search tactics for an intelligent intermediary system , 1989, TOIS.

[15]  R. Bareiss Exemplar-Based Knowledge Acquisition , 1989 .