Fusion analysis of information retrieval models on biomedical collections

A variety of endeavors have been made to improve the performance of traditional information retrieval models in biomedical domain. However, majority of the studies have focused on improving the performance of individual information retrieval models, while few attempts have been made to the investigation of combining multiple information retrieval models and exploring their interactions in biomedical information retrieval area. In this study, a comprehensive performance evaluation of seven popular generic information retrieval models is conducted on a biomedical literature collection. In addition, an information fusion method called the Combinatorial Fusion Analysis is applied to perform extensive combinatorial experiments on these information retrieval models. Our experimental results have demonstrated that a combination of multiple information retrieval models can outperform a single model only if each of the individual models has different scoring and ranking behavior and relatively high performance.

[1]  Jimmy J. Lin,et al.  Fusion of Knowledge-Intensive and Statistical Approaches for Retrieving and Annotating Textual Genomics Documents , 2005, TREC.

[2]  Ellen M. Voorhees,et al.  The fourteenth text retrieval conference TREC 2005 , 2006 .

[3]  Damian M. Lyons,et al.  Combining multiple scoring systems for target tracking using rank-score characteristics , 2009, Inf. Fusion.

[4]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[5]  Mark Dredze,et al.  TREC 2005 Genomics Track Experiments at IBM Watson , 2005, TREC.

[6]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[7]  Luo Si,et al.  York University at TREC 2007: Genomics Track , 2005, TREC.

[8]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[9]  Chuan Yi Tang,et al.  Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction , 2007, IEEE Transactions on NanoBioscience.

[10]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[11]  D. Frank Hsu,et al.  Consensus Scoring Criteria for Improving Enrichment in Virtual Screening , 2005, J. Chem. Inf. Model..

[12]  Stephen E. Robertson,et al.  A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[13]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[14]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[15]  Sumio Fujita Revisiting Again Document Length Hypotheses TREC 2004 Genomics Track Experiments at Patolis , 2004, TREC.

[16]  Hui-Huang Hsu,et al.  Advanced Data Mining Technologies in Bioinformatics , 2006 .

[17]  Hagit Shatkay,et al.  Applying Probabilistic Thematic Clustering for Classification in the TREC 2005 Genomics Track , 2005, TREC.

[18]  D. Frank Hsu,et al.  A study of data fusion in Cayley graphs G(s/sub n/,p/sub n/) , 2004, 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings..

[19]  D. Frank Hsu,et al.  Combinatorial Fusion Analysis: Methods and Practices of Combining Multiple Scoring Systems , 2006 .

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  Damian M. Lyons,et al.  RAF: a dynamic and efficient approach to fusion for multitarget tracking in CCTV surveillance , 2003, Proceedings of IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, MFI2003..

[22]  Ellen M. Voorhees,et al.  Proceedings of the Fourteenth Text REtrieval Conference, TREC 2005, Gaithersburg, Maryland, USA, November 15-18, 2005 , 2005, NIST Special Publication.

[23]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..