A decision theoretic approach to combining information filters: An analytical and empirical evaluation

The outputs of several information filtering (IF) systems can be combined to improve filtering performance. In this article the authors propose and explore a framework based on the so-called information structure (IS) model, which is frequently used in Information Economics, for combining the output of multiple IF systems according to each user's preferences (profile). The combination seeks to maximize the expected payoff to that user. The authors show analytically that the proposed framework increases users expected payoff from the combined filtering output for any user preferences. An experiment using the TREC-6 test collection confirms the theoretical findings. Introduction Many retrieval evaluation studies support the hypothesis that combining retrieval output from several systems enhances the benefits of the individual systems, resulting in improved effectiveness of combined results recent review of combination approaches and studies). Croft describes several aspects of information retrieval (IR) combination: (a) the combination of multiple representations of documents in a single search; (b) the combination of different queries as additional evidence of the searcher's information needs; (c) the combination of ranking algorithms, and (d) the combination of output from different search systems. In the current study we perform the last type of combination, that is, we combine the outputs of two information filtering (IF) systems to maximize user utility. Information filtering systems seek precisely the relevant documents in an incoming stream of information. This becomes a dual objective: to maximize the relevant information and minimize the nonrelevant information sent to users. Information filtering systems typically support users having long-term information needs, which may be expressed as a We propose a combination framework consistent with the information structure (IS) model used in information economics to evaluate the value of information (Marschak, 1971). This model traces its origin to fundamental work in statistics and the theory of games, which viewed the design of experiments as a game against nature (Blackwell & Girshick, 1954/1979). The idea of modeling IR as a decision theory model was first presented by Kraft and Bookstein (1978), who also proposed several performance measures for overall retrieval performance. They showed that maximizing precision could be equivalent, under certain conditions , to maximizing the expected value of the IR system. In the IS model users represent their preferences by a payoff matrix (Ronen & Spector, 1995). The model seeks the optimal decision strategy based on those preferences (McGuire & Radner, 1986). Elovici, Shapira, and Kantor (2003) presented a …

[1]  Peretz Shoval,et al.  Information Filtering: Overview of Issues, Research and Systems , 2001, User Modeling and User-Adapted Interaction.

[2]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[3]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[4]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[5]  Peter Schäuble,et al.  Improving a Basic Retrieval Method by Links and Passage Level Evidence , 1994, TREC.

[6]  W. Bruce Croft,et al.  A retrieval model incorporating hypertext links , 1989, Hypertext.

[7]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[8]  Garrison W. Cottrell,et al.  Predicting the performance of linearly combined IR systems , 1998, SIGIR '98.

[9]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[10]  Boaz Ronen,et al.  Evaluating sampling strategy under two criteria , 1995 .

[11]  Niv Ahituv,et al.  Orthogonal Information Structures: a Model to Evaluate the Information Provided by a Second Opinion , 1986 .

[12]  Paul B. Kantor,et al.  Using the Information Structure Model to Compare Profile-Based Information Filtering Systems , 2004, Information Retrieval.

[13]  Mark D. Dunlop,et al.  Image retrieval by hypertext links , 1997, SIGIR '97.

[14]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[15]  D. Blackwell Comparison of Experiments , 1951 .

[16]  Curtis Daniel MacDougall,et al.  The decision and the organization , 1965 .

[17]  Paul Thompson Description of the PRC CEO Algorithm for TREC , 1992, TREC.

[18]  Hinrich Schütze,et al.  Method combination for document filtering , 1996, SIGIR '96.

[19]  Anahí Gallardo Velázquez,et al.  Conference , 1969, Journal of Neuroscience Methods.

[20]  D. R. Elchesen,et al.  General: Effectiveness of Combining Title Words and Index Terms in Machine Retrieval Searches , 1972, Nature.

[21]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[22]  Donald H. Kraft,et al.  Evaluation of information retrieval systems: A decision theory approach , 1978, J. Am. Soc. Inf. Sci..

[23]  Paul B. Kantor,et al.  A study of information seeking and retrieving. III. Searchers, searches, and overlap , 1988, J. Am. Soc. Inf. Sci..

[24]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[25]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[26]  Kagan Tumer,et al.  Linear and Order Statistics Combiners for Pattern Classification , 1999, ArXiv.

[27]  Paul B. Kantor,et al.  Decision Level Data Fusion for Routing of Documents in the TREC3 Context: A Base Case Analysis of Worst Case Results , 1994, TREC.

[28]  M. Shubik,et al.  Theory of Games and Statistical Decisions. , 1955 .

[29]  Paul B. Kantor,et al.  Predicting the effectiveness of naïve data fusion on the basis of system characteristics , 2000, J. Am. Soc. Inf. Sci..

[30]  Edward A. Fox,et al.  Coefficients of combining concept classes in a collection , 1988, SIGIR '88.

[31]  H Austin,et al.  The economics of information systems. , 1986, Computers in healthcare.

[32]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[33]  Mark E. Frisse,et al.  Information retrieval from hypertext: update on the dynamic medical handbook project , 1989, Hypertext.

[34]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[35]  W. Bruce Croft,et al.  Retrieval of Complex Objects , 1992, EDBT.

[36]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[37]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[38]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[39]  W. Bruce Croft,et al.  Combining Automatic and Manual Index Representations in Probabilistic Retrieval , 1995, J. Am. Soc. Inf. Sci..

[40]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[41]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[42]  Douglas W. Oard,et al.  The State of the Art in Text Filtering , 1997, User Modeling and User-Adapted Interaction.

[43]  Ronald W. Hilton Failure of Blackwell's Theorem under Machina's generalization of expected-utility analysis without the independence axiom , 1990 .