A probabilistic model of information retrieval: development and comparative experiments - Part 2

The paper combines a comprehensive account of a probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Each step in the argument is matched by comparative retrieval tests, to provide a single coherent account of a major line of research. The experiments demonstrate, for a large test collection, that the probabilistic model is effective and robust, and that it responds appropriately, with major improvements in performance, to key features of retrieval situations. Part 1 covers the foundations and the model development for document collection and relevance data, along with the test apparatus. Part 2 covers the further development and elaboration of the model, with extensive testing, and briefly considers other environment conditions and tasks, model training, concluding with comparisons with other approaches and an overall assessment. Data and results tables for both parts are given in Part 1. Key results are summarised in Part 2.

[1]  Donna Harman,et al.  Information Processing and Management , 2022 .

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  William S. Cooper,et al.  Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval , 1995, TOIS.

[4]  David Hawking,et al.  Overview of TREC-7 Very Large Collection Track , 1997, TREC.

[5]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[6]  Stephen E. Robertson,et al.  On relevance weights with little relevance information , 1997, SIGIR '97.

[7]  Alan F. Smeaton,et al.  Spanish and Chinese Document Retrieval in TREC-5 , 1996, TREC.

[8]  Djoerd Hiemstra,et al.  A probabilistic justification for using tf×idf term weighting in information retrieval , 2000, International Journal on Digital Libraries.

[9]  Ellen M. Voorhees,et al.  The fifth text REtrieval conference (TREC-5) , 1997 .

[10]  Stephen E. Robertson,et al.  Okapi at TREC-6 Automatic ad hoc, VLC, routing, filtering and QSDR , 1997, TREC.

[11]  Karen Spärck Jones Experiments in relevance weighting of search terms , 1979, Inf. Process. Manag..

[12]  Stephen Robertson,et al.  Statistical problems in the application of probabilistic models to information retrieval , 1982 .

[13]  Donna Harman,et al.  The fourth text REtrieval conference , 1996 .

[14]  David R. Cox The analysis of binary data , 1970 .

[15]  S. Robertson The probability ranking principle in IR , 1997 .

[16]  Karen Sparck Jones,et al.  Spoken Document Retrieval for TREC-8 at Cambridge University , 1998, TREC.

[17]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[18]  Karen Spärck Jones Search Term Relevance Weighting given Little Relevance Information , 1997, J. Documentation.

[19]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[20]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[21]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[22]  C. J. van Rijsbergen,et al.  An Evaluation of feedback in Document Retrieval using Co‐Occurrence Data , 1978, J. Documentation.

[23]  Karen Spärck Jones,et al.  Experiments in Spoken Document Retrieval , 1996, Inf. Process. Manag..

[24]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[25]  Norbert Fuhr,et al.  The automatic indexing system AIR/PHYS - from research to applications , 1988, SIGIR '88.

[26]  Ellen M. Voorhees,et al.  The seventh text REtrieval conference (TREC-7) , 1999 .

[27]  Therese Firmin Hand,et al.  A Proposal for Task-based Evaluation of Text Summarization Systems , 1997, Workshop On Intelligent Scalable Text Summarization.

[28]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing , 1974 .

[29]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[30]  Ellen M. Voorhees,et al.  The Sixth Text REtrieval Conference (TREC-6) , 2000, Inf. Process. Manag..

[31]  Claire Cardie,et al.  An Analysis of Statistical and Syntactic Phrases , 1997, RIAO.

[32]  Donna Harman,et al.  The Second Text Retrieval Conference (TREC-2) , 1995, Inf. Process. Manag..

[33]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[34]  Kui-Lam Kwok,et al.  A network approach to probabilistic information retrieval , 1995, TOIS.

[35]  K. Sparck Jones,et al.  A Probabilistic Model of Information Retrieval : Development and Status , 1998 .

[36]  Gerard Salton,et al.  A theory of indexing , 1975, Regional conference series in applied mathematics.

[37]  K. Sparck Jones,et al.  A TEST FOR THE SEPARATION OF RELEVANT AND NON‐RELEVANT DOCUMENTS IN EXPERIMENTAL RETRIEVAL COLLECTIONS , 1973 .

[38]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[39]  J Allan,et al.  Readings in information retrieval. , 1998 .

[40]  F. W. Lancaster,et al.  MEDLARS: Report on the Evaluation of Its Operating Efficiency. , 1997 .

[41]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[42]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[43]  Fabrizio Sebastiani,et al.  Trends in ... a Critical Review: On the Role of Logic in Information Retrieval , 1998, Inf. Process. Manag..

[44]  D. Cox,et al.  The analysis of binary data , 1971 .

[45]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[46]  Stephen E. Robertson,et al.  Threshold setting in adaptive filtering , 2000, J. Documentation.

[47]  Nicholas J. Belkin,et al.  Ranking in Principle , 1978, J. Documentation.

[48]  Karen Sparck Jones What is the Role of NLP in Text Retrieval , 1999 .

[49]  Karen Sparck Jones A PERFORMANCE YARDSTICK FOR TEST COLLECTIONS , 1975 .

[50]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[51]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[52]  Stephen E. Robertson,et al.  Probabilistic models of indexing and searching , 1980, SIGIR '80.

[53]  David A. Evans,et al.  Clarit-TREC Experiments , 1995, Inf. Process. Manag..

[54]  W. Bruce Croft,et al.  TREC and Tipster Experiments with Inquery , 1995, Inf. Process. Manag..

[55]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[56]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[57]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[58]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[59]  Norbert Fuhr,et al.  Probalistic Learning Approaches for Indexing and Retrieval with the TREC-2 Collection , 1993, TREC.

[60]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[61]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..