Combining Approaches to Information Retrieval

The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the Web. This paper examines the development of this technique, including both experimental results and the retrieval models that have been proposed as formal frameworks for combination. We show that combining approaches for information retrieval can be modeled as combining the outputs of multiple classifiers based on one or more representations, and that this simple model can provide explanations for many of the experimental results. We also show that this view of combination is very similar to the inference net model, and that a new approach to retrieval based on language models supports combination and can be integrated with the inference net model.

[1]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[2]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[3]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[4]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[5]  Gerard Salton,et al.  AUTOMATIC INDEXING USING BIBLIOGRAPHIC CITATIONS , 1971 .

[6]  D. R. Elchesen,et al.  General: Effectiveness of Combining Title Words and Index Terms in Machine Retrieval Searches , 1972, Nature.

[7]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[8]  John O'Connor,et al.  Retrieval of answer-sentences and answer-figures from papers by text searching , 1975, Inf. Process. Manag..

[9]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[10]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[11]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[12]  Michael McGill,et al.  An Evaluation of Factors Affecting Document Ranking by Information Retrieval Systems. , 1979 .

[13]  John O'Connor,et al.  Answer-passage retrieval by text searching , 1980, J. Am. Soc. Inf. Sci..

[14]  Harold Borko,et al.  Automatic indexing , 1981, ACM '81.

[15]  Edward Fox,et al.  Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types , 1983 .

[16]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[17]  W. Bruce Croft,et al.  The Use of Adaptive Mechanisms for Selection of Search Strategies in Document Retrieval Systems , 1984, SIGIR.

[18]  Edward A. Fox,et al.  Advanced feedback methods in information retrieval , 1985, J. Am. Soc. Inf. Sci..

[19]  Elaine Svenonius Unanswered questions in the design of controlled vocabularies , 1986 .

[20]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[21]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[22]  Edward A. Fox,et al.  Architecture of an expert system for composite document analysis, representation, and retrieval , 1997, Int. J. Approx. Reason..

[23]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[24]  Paul B. Kantor,et al.  A Study of Information Seeking and Retrieving. III. Searchers, Searches, and Overlap* , 1988 .

[25]  Edward A. Fox,et al.  Coefficients of combining concept classes in a collection , 1988, SIGIR '88.

[26]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[27]  Miranda Lee Pao,et al.  Retrieval effectiveness by semantic and citation searching , 1989, JASIS.

[28]  W. Bruce Croft,et al.  A retrieval model incorporating hypertext links , 1989, Hypertext.

[29]  W. Bruce Croft,et al.  Retrieving documents by plausible inference: An experimental study , 1989, Inf. Process. Manag..

[30]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[31]  Joel L. Fagan The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval , 1989 .

[32]  Carolyn J. Crouch,et al.  The automatic generation of extended queries , 1989, SIGIR '90.

[33]  Mark E. Frisse,et al.  Information retrieval from hypertext: update on the dynamic medical handbook project , 1989, Hypertext.

[34]  Norbert Fuhr,et al.  A Probabilistic Framework for Vague Queries and Imprecise Information in Databases , 1990, VLDB.

[35]  W. Bruce Croft,et al.  Interactive retrieval of complex documents , 1990, Inf. Process. Manag..

[36]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[37]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[38]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[39]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[40]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[41]  W. Bruce Croft,et al.  Retrieval of Complex Objects , 1992, EDBT.

[42]  W. Bruce Croft,et al.  A Comparison of Text Retrieval Models , 1992, Comput. J..

[43]  Donna K. Harman,et al.  The DARPA TIPSTER project , 1992, SIGF.

[44]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[45]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[46]  W. Bruce Croft,et al.  An evaluation of query processing strategies using the TIPSTER collection , 1993, SIGIR.

[47]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[48]  W. Bruce Croft,et al.  Relevance feedback and inference networks , 1993, SIGIR.

[49]  Christian Plaunt,et al.  Subtopic structuring for full-length document access , 1993, SIGIR.

[50]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[51]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[52]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[53]  Peter Schäuble,et al.  Document and passage retrieval based on hidden Markov models , 1994, SIGIR '94.

[54]  Fredric C. Gey,et al.  Inferring probability of relevance using the method of logistic regression , 1994, SIGIR '94.

[55]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[56]  David Maxwell Chickering,et al.  Learning Bayesian networks: The combination of knowledge and statistical data , 1995, Mach. Learn..

[57]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[58]  W. Bruce Croft,et al.  Combining automatic and manual index representations in probabilistic retrieval , 1995 .

[59]  W. Bruce Croft,et al.  TREC and Tipster Experiments with Inquery , 1995, Inf. Process. Manag..

[60]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[61]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[62]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[63]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[64]  Hinrich Schütze,et al.  Method combination for document filtering , 1996, SIGIR '96.

[65]  W. Bruce Croft,et al.  Adaptive query modification in a probabilistic information retrieval model , 1996 .

[66]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[67]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[68]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[69]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[70]  W. Bruce Croft,et al.  Computationally tractable probabilistic modeling of Boolean operators , 1997, SIGIR '97.

[71]  R. Manmatha,et al.  Image retrieval by appearance , 1997, SIGIR '97.

[72]  S. Robertson The probability ranking principle in IR , 1997 .

[73]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[74]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[75]  Mark D. Dunlop,et al.  Image retrieval by hypertext links , 1997, SIGIR '97.

[76]  Garrison W. Cottrell,et al.  Predicting the performance of linearly combined IR systems , 1998, SIGIR '98.

[77]  Ronald Fagin,et al.  Fuzzy queries in multimedia database systems , 1998, PODS '98.

[78]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[79]  Pavel Zezula,et al.  Processing Complex Similarity Queries with Distance-Based Access Methods , 1998, EDBT.

[80]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[81]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[82]  Warren R. Greiff,et al.  A theory of term weighting based on exploratory data analysis , 1998, SIGIR '98.

[83]  John Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR 1999.

[84]  W. Bruce Croft,et al.  Maximum entropy, weight of evidence and information retrieval , 1999 .

[85]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[86]  Kagan Tumer,et al.  Linear and Order Statistics Combiners for Pattern Classification , 1999, ArXiv.

[87]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[88]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[89]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.