Combinatorial Fusion Analysis for Meta Search Information Retrieval

Leading commercial search engines are built as single event systems. In response to a particular search query, the search engine returns a single list of ranked search results. To find more relevant results the user must frequently try several other search engines. A meta search engine was developed to enhance the process of multi-engine querying. The meta search engine queries several engines at the same time and fuses individual engine results into a single search results list. The fusion of multiple search results has been shown (mostly experimentally) to be highly effective. However, the question of why and how the fusion should be done still remains largely unanswered. In this chapter, we utilize the combinatorial fusion analysis proposed by Hsu et al. to analyze combination and fusion of multiple sources of information. A rank/score function is used in the design and analysis of our framework. The framework provides a better understanding of the fusion phenomenon in information retrieval. For example, to improve the performance of the combined multiple scoring systems, it is necessary that each of the individual scoring systems has relatively high performance and the individual scoring systems are diverse. Additionally, we illustrate various applications of the framework using two examples from the information retrieval domain.

[1]  H. Young,et al.  A Consistent Extension of Condorcet’s Election Principle , 1978 .

[2]  H. Young Social Choice Scoring Functions , 1975 .

[3]  D. Frank Hsu,et al.  Consensus Scoring Criteria for Improving Enrichment in Virtual Screening , 2005, J. Chem. Inf. Model..

[4]  Yiyu Yao,et al.  Web-based information retrieval support systems: building research tools for scientists in the new information age , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[5]  Yiyu Yao Information retrieval support systems , 2002 .

[6]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[7]  Chuan Yi Tang,et al.  On the Diversity-Performance Relationship for Majority Voting in Classifier Ensembles , 2007, MCS.

[8]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[9]  Jacob Shapiro,et al.  Constructing Web search queries from the user's information need expressed in a natural language , 2003, SAC '03.

[10]  Caro Lucas,et al.  Aggregation of web search engines based on users' preferences in WebFusion , 2007, Knowl. Based Syst..

[11]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[12]  Chuan Yi Tang,et al.  Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction , 2007, IEEE Transactions on NanoBioscience.

[13]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[14]  JingTao Yao,et al.  Design of Web-based Support Systems , 2005 .

[15]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[16]  M. Kendall,et al.  Rank Correlation Methods , 1949 .

[17]  Vijay V. Raghavan,et al.  Automatically Detecting Boolean Operations Supported by Search Engines, Towards Search Engine Query Language Discovery , 2004, Workshop on Web-based Support Systems.

[18]  Stuart M. Brown,et al.  Selection and validation of differentially expressed genes in head and neck cancer , 2004, Cellular and Molecular Life Sciences CMLS.

[19]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[20]  Paul B. Kantor,et al.  Predicting the effectiveness of Naïve data fusion on the basis of system characteristics , 2000 .

[21]  Chuen-Der Huang,et al.  Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification , 2003, IEEE Transactions on NanoBioscience.

[22]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[23]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[24]  Kwong Bor Ng,et al.  An investigation of the conditions for effective data fusion in information retrieval , 1998 .

[25]  D. Frank Hsu,et al.  A Study of Data Fusion in Cayley Graphs G(S{n}, P{n}). , 2004 .

[26]  Hu Dongping,et al.  A Comprehensive RMS Model for P2P e-Commerce Communities , 2006 .

[27]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[28]  Nicholas J. Belkin,et al.  Combining Evidence for Information Retrieval , 1993, TREC.

[29]  Ludmila I. Kuncheva Diversity in multiple classifier systems , 2005, Inf. Fusion.

[30]  Ofer Melnik,et al.  Mixed group ranks: preference and confidence in classifier combination , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Damian M. Lyons,et al.  Combining multiple scoring systems for target tracking using rank-score characteristics , 2009, Inf. Fusion.

[32]  D. Frank Hsu,et al.  Combinatorial fusion with on-line learning algorithms , 2008, 2008 11th International Conference on Information Fusion.

[33]  Lisa Fan,et al.  Web-Based Learning Support System , 2010 .

[34]  M. Kendall Rank Correlation Methods , 1949 .

[35]  Norbert Fuhr,et al.  Retrieval Effectiveness of Proper Name Search Methods , 1996, Inf. Process. Manag..

[36]  J. Marden Analyzing and Modeling Rank Data , 1996 .