论文信息 - Document Ranking and the Vector-Space Model

Document Ranking and the Vector-Space Model

Efficient and effective text retrieval techniques are critical in managing the increasing amount of textual information available in electronic form. Yet text retrieval is a daunting task because it is difficult to extract the semantics of natural language texts. Many problems must be resolved before natural language processing techniques can be effectively applied to a large collection of texts. Most existing text retrieval techniques rely on indexing keywords. Unfortunately, keywords or index terms alone cannot adequately capture the document contents, resulting in poor retrieval performance. Yet keyword indexing is widely used in commercial systems because it is still the most viable way by far to process large amounts of text. Using several simplifications of the vector-space model for text retrieval queries, the authors seek the optimal balance between processing efficiency and retrieval effectiveness as expressed in relevant document rankings.

[1] E. Michael Keen,et al. Presenting Results of Experimental Retrieval Comparisons , 1997, Inf. Process. Manag..

[2] Chris Buckley,et al. Implementation of the SMART Information Retrieval System , 1985 .

[3] Gerard Salton,et al. Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[4] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[5] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[6] Donna K. Harman,et al. Overview of the first TREC conference , 1993, SIGIR.

[7] Donna Harman,et al. Overview of the First Text REtrieval Conference. , 1993, SIGIR 1993.

[8] Craig Stanfill,et al. Parallel free-text search on the connection machine system , 1986, CACM.