Index structures for information filtering under the vector space model

The authors study what data structures and algorithms can be used to efficiently perform large-scale information filtering under the vector space model, a retrieval model established as being effective. They apply the idea of the standard inverted index to index user profiles. They devise an alternative to the standard inverted index, in which they, instead of indexing every term in a profile, select only the significant ones to index. They evaluate their performance and show that the indexing methods require orders of magnitude fewer I/Os to process a document than when no index is used. They also show that the proposed alternative performs better in terms of I/O and CPU processing time in many cases.<<ETX>>

[1]  Brewster Kahle,et al.  An information system for corporate users: wide area information servers , 1991 .

[2]  Christos Faloutsos,et al.  Access methods for text , 1985, CSUR.

[3]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[4]  Hector Garcia-Molina,et al.  Performance of inverted indices in shared-nothing distributed text document information retrieval systems , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[5]  G Salton,et al.  Global Text Matching for Information Retrieval , 1991, Science.

[6]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[7]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[8]  Stephen Pollock,et al.  A rule-based message filtering system , 1988, TOIS.

[9]  Peter B. Danzig,et al.  Distributed indexing: a scalable mechanism for distributed information retrieval , 1991, SIGIR '91.

[10]  Hector Garcia-Molina,et al.  Index structures for selective dissemination of information under the Boolean model , 1994, TODS.

[11]  Hector Garcia-Molina,et al.  Performance of Inverted Indices in Distributed Text Document Retrieval Systems , 1993 .

[12]  Gerald Salton,et al.  Automatic text processing , 1988 .

[13]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[14]  W. Bruce Croft The University of Massachusetts TIPSTER project , 1992, SIGF.