论文信息 - Simplified similarity scoring using term ranks

Simplified similarity scoring using term ranks

We propose a method for document ranking that combines a simple document-centric view of text, and fast evaluation strategies that have been developed in connection with the vector space model. The new method defines the importance of a term within a document qualitatively rather than quantitatively, and in doing so reduces the need for tuning parameters. In addition, the method supports very fast query processing, with most of the computation carried out on small integers, and dynamic pruning an effective option. Experiments on a wide range of TREC data show that the new method provides retrieval effectiveness as good as or better than the Okapi BM25 formulation, and variants of language models.

Alistair Moffat | Vo Ngoc Anh

[1] Chris Buckley,et al. Pivoted Document Length Normalization , 1996, SIGIR Forum.

[2] Ian H. Witten,et al. Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[3] Alistair Moffat,et al. Exploring the similarity space , 1998, SIGF.

[4] Alistair Moffat,et al. Impact transformation: effective and efficient web retrieval , 2002, SIGIR '02.

[5] Ophir Frieder,et al. Document normalization revisited , 2002, SIGIR '02.

[6] Alistair Moffat,et al. Vector-space ranking with effective early termination , 2001, SIGIR '01.

[7] Ellen M. Voorhees,et al. Evaluating evaluation measure stability , 2000, SIGIR '00.

[8] Djoerd Hiemstra,et al. Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002 , 2003, SIGF.

[9] Rong Jin,et al. Title language model for information retrieval , 2002, SIGIR '02.

[10] Alistair Moffat,et al. Inverted Index Compression Using Word-Aligned Binary Codes , 2004, Information Retrieval.

[11] Gerard Salton,et al. Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[12] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .