A Survival Modeling Approach to Biomedical Search Result Diversification Using Wikipedia

In this paper, we propose a survival modeling approach to promoting ranking diversity for biomedical information retrieval. The proposed approach concerns with finding relevant documents that can deliver more different aspects of a query. First, two probabilistic models derived from the survival analysis theory are proposed for measuring aspect novelty. Second, a new method using Wikipedia to detect aspects covered by retrieved documents is presented. Third, an aspect filter based on a two-stage model is introduced. It ranks the detected aspects in decreasing order of the probability that an aspect is generated by the query. Finally, the relevance and the novelty of retrieved documents are combined at the aspect level for reranking. Experiments conducted on the TREC 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed approach in promoting ranking diversity for biomedical information retrieval. Moreover, we further evaluate our approach in the Web retrieval environment. The evaluation results on the ClueWeb09-T09B collection show that our approach can achieve promising performance improvements.

[1]  Xiaojin Zhu,et al.  Ranking Biomedical Passages for Relevance and Diversity: University of Wisconsin, Madison at TREC Genomics 2006 , 2006, TREC.

[2]  Luo Si,et al.  York University at TREC 2007: Genomics Track , 2005, TREC.

[3]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[4]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[5]  Shuming Shi,et al.  Microsoft Research Asia at the Web Track of TREC 2009 , 2009, TREC.

[6]  Xiangji Huang,et al.  Integrating multiple document features in language models for expert finding , 2010, Knowledge and Information Systems.

[7]  Zhoujun Li,et al.  Promoting Ranking Diversity for Biomedical Information Retrieval Using Wikipedia , 2010, ECIR.

[8]  Ian H. Witten,et al.  Clustering Documents with Active Learning Using Wikipedia , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  Jaap Kamps,et al.  Experiments with result diversity and entity ranking: Text, anchors, links, and Wikipedia , 2009 .

[10]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[11]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[12]  Patrick Ruch,et al.  Vocabulary-Driven Passage Retrieval for Question-Answering in Genomics , 2007, TREC.

[13]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[14]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[15]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[16]  Patrick Ruch,et al.  Combining Resources to Find Answers to Biomedical Questions , 2007, TREC.

[17]  Clement T. Yu,et al.  TREC Genomics Track at UIC , 2007, TREC.

[18]  Shenghua Bao,et al.  Research on Expert Search at Enterprise Track of TREC 2006 , 2005, TREC.

[19]  Zhoujun Li,et al.  Mining and modeling linkage information from citation context for improving biomedical literature retrieval , 2011, Inf. Process. Manag..

[20]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[21]  Stephen E. Robertson,et al.  Okapi at TREC-5 , 1996, TREC.

[22]  Yang Xu,et al.  Query dependent pseudo-relevance feedback based on wikipedia , 2009, SIGIR.

[23]  Craig MacDonald,et al.  University of Glasgow at TREC 2009: Experiments with Terrier , 2009, TREC.

[24]  R. Kay The Analysis of Survival Data , 2012 .

[25]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[26]  Dina Demner-Fushman,et al.  Application of Information Technology: Essie: A Concept-based Search Engine for Structured Biomedical Text , 2007, J. Am. Medical Informatics Assoc..

[27]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[28]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[29]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[30]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[31]  Yi Li,et al.  Entity-Based Relevance Feedback for Genomic List Answer Retrieval , 2007, TREC.

[32]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[33]  Qinmin Hu,et al.  A bayesian learning approach to promoting diversity in ranking for biomedical information retrieval , 2009, SIGIR.

[34]  William R. Hersh,et al.  A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task , 2007, AMIA.

[35]  Ian H. Witten,et al.  A knowledge-based search engine powered by wikipedia , 2007, CIKM '07.

[36]  Jacques Savoy Bibliographic database access using free-text and controlled vocabulary: an evaluation , 2005, Inf. Process. Manag..

[37]  Ben He,et al.  York University at TREC 2009: Relevance Feedback Track , 2009, TREC.

[38]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[39]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.