论文信息 - Contextualizing Retrieval of Full-Length Documents

Contextualizing Retrieval of Full-Length Documents

We address some issues relating to retrieval from unfamiliar text collections consisting of full-length documents. We claim that displaying query results in terms of inter-document similarity is inappropriate with long texts, and suggest instead that the results of simple initial queries should be contextualized according to category sets that correspond to the main topics of the texts. We argue that main topics of long texts should be represented by multiple categories, since in most cases one category cannot adequately classify a text. We describe a new automatic categorization algorithm that does not require pre-labeled texts and a prototype browsing interface that presents a simple mechanism for displaying multi-dimensional information.

Marti A. Hearst

[1] Marti A. Hearst. Cases as Structured Indexes for Full-Length Documents , 1993 .

[2] W. Bruce Croft,et al. I3R: A new approach to the design of document retrieval systems , 1987, J. Am. Soc. Inf. Sci..

[3] Wendy A. Lawrence-Fowler,et al. Integrating query thesaurus, and documents through a common visual representation , 1991, SIGIR '91.

[4] David Yarowsky,et al. Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[5] Anselm Spoerri,et al. InfoCrystal: a visual tool for information retrieval & management , 1993, CIKM '93.

[6] Stuart L. Crawford,et al. An architecture for probabilistic concept-based information retrieval , 1989, SIGIR '90.

[7] Ellen Riloff,et al. Classifying Texts Using Relevancy Signatures , 1992, AAAI.

[8] Donna Harman,et al. Overview of the First Text REtrieval Conference. , 1993, SIGIR 1993.

[9] Gerard Salton,et al. Automatic text structuring experiments , 1992 .

[10] Lisa F. Rau,et al. SCISOR: extracting information from on-line news , 1990, CACM.

[11] F. W. Lancaster,et al. Vocabulary control for information retrieval , 1972 .