Experiments with a component theory of probabilistic information retrieval based on single terms as document components

A component theory of information retrieval using single content terms as component for queries and documents was reviewed and experimented with. The theory has the advantages of being able to (1) bootstrap itself, that is, define initial term weights naturally based on the fact that items are self relevent; (2) make use of within-item term frequencies; (3) account for query-focused and document-focused indexing and retrieval strategies cooperatively; and (4) allow for component-specific feedback if such information is available. Retrieval results with four collections support the effectiveness of all the first three aspects, except for predictive retrieval. At the initial indexing stage, the retrieval theory performed much more consistantly across collections than croft's model and provided results comparable to Salton's tf*idf approach. An inverse collection term frequency (ICTF) formula was also tested that performed much better than the inverse document frequency (IDF). With full feedback retrospective retrieval, the component theory performed substantially better than Croft's, because of the highly specific nature of document-focused feedback. Repetitive retireval results with partial relevance feedback mirrored those for the retrospective. However, for the important case of predictive retrieval using residual ranking, results were not unequivocal.

[1]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part II. An algorithm for probabilistic indexing , 1975, J. Am. Soc. Inf. Sci..

[2]  Edward A. Fox,et al.  Development of the coder system: A testbed for artificial intelligence methods in information retrieval , 1987, Inf. Process. Manag..

[3]  Alan F. Smeaton,et al.  The Retrieval Effects of Query Expansion on a Feedback Document Retrieval System , 1983, Comput. J..

[4]  Clement T. Yu,et al.  On the Construction of Feedback Queries , 1982, JACM.

[5]  Clement T. Yu,et al.  A framework for effective retrieval , 1989, ACM Trans. Database Syst..

[6]  Kui-Lam Kwok An interpretation of index term weighting schemes based on document components , 1986, SIGIR '86.

[7]  Clement T. Yu,et al.  The measurement of term importance in automatic indexing , 1981, J. Am. Soc. Inf. Sci..

[8]  Stephen E. Robertson,et al.  Probabilistic models of indexing and searching , 1980, SIGIR '80.

[9]  W. Bruce Croft,et al.  I3R: A new approach to the design of document retrieval systems , 1987, J. Am. Soc. Inf. Sci..

[10]  D. A. Kemp Relevance, pertinence and information system development , 1974, Inf. Storage Retr..

[11]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[12]  Kui-Lam Kwok A neural network for probabilistic information retrieval , 1989, SIGIR '89.

[13]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[14]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[15]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[16]  Norbert Fuhr,et al.  Models for retrieval with probabilistic indexing , 1989, Inf. Process. Manag..

[17]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[18]  C. J. van Rijsbergen,et al.  An Evaluation of feedback in Document Retrieval using Co‐Occurrence Data , 1978, J. Documentation.

[19]  Kui-Lam Kwok,et al.  Experiments with document components for indexing and retrieval , 1988, Inf. Process. Manag..

[20]  Edward A. Fox,et al.  Coefficients of combining concept classes in a collection , 1988, SIGIR '88.

[21]  Donald H. Kraft,et al.  Operations Research Applied to Document Indexing and Retrieval Decisions , 1977, JACM.

[22]  Karen Spärck Jones Experiments in relevance weighting of search terms , 1979, Inf. Process. Manag..

[23]  W. Bruce Croft,et al.  Term clustering of syntactic phrases , 1989, SIGIR '90.

[24]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[25]  W. Bruce Croft,et al.  Retrieving documents by plausible inference: An experimental study , 1989, Inf. Process. Manag..

[26]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[27]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[28]  Kui-Lam Kwok,et al.  Classification of scientific documents by means of self-generated groups employing free language , 1973, J. Am. Soc. Inf. Sci..

[29]  G. Salton,et al.  A Generalized Term Dependence Model in Information Retrieval , 1983 .

[30]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[31]  Clement T. Yu,et al.  Precision Weighting—An Effective Automatic Indexing Method , 1976, J. ACM.

[32]  Kui-Lam Kwok,et al.  A probabilistic theory of indexing and similarity measure based on cited and citing documents , 1985, J. Am. Soc. Inf. Sci..

[33]  Norbert Fuhr,et al.  Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.

[34]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[35]  Don R. Swanson,et al.  A decision theoretic foundation for indexing , 1975, J. Am. Soc. Inf. Sci..

[36]  Yiyu Yao,et al.  A probability distribution model for information retrieval , 1989, Inf. Process. Manag..

[37]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part II. Results of a Design Study , 1982, J. Documentation.