Using Prior Information Derived from Citations in Literature Search

Researchers spend a large amount of their time searching through an ever increasing number of scientific articles. Although users of scientific literature search engines prefer the ranking of results according to the number of citations a publication has received, it is unknown whether this notion of authoritativeness could also benefit more traditional and objective measures. Is it also an indicator of relevance, given an information need? In this paper, we examine the relationship between citation features of a scientific article and its prior probability of actually being relevant to an information need. We propose various ways of modeling this relationship and show how this kind of contextual information can be incorporated within a language modeling framework. We experiment with three document priors, which we evaluate on three distinct sets of queries and two document collections from the TREC Genomics track. Empirical results show that two of the proposed priors can significantly improve retrieval effectiveness, measured in terms of mean average precision.

[1]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[2]  Peter Bailey,et al.  Overview of the TREC-8 Web Track , 2000, TREC.

[3]  Sergey N. Dorogovtsev,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW (Physics) , 2003 .

[4]  Loren G. Terveen,et al.  Does “authority” mean quality? predicting expert quality ratings of Web documents , 2000, SIGIR '00.

[5]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[6]  Peng Dong,et al.  The "impact factor" revisited , 2005, Biomedical digital libraries.

[7]  Matthew Baylis,et al.  Sprucing up one's impact factor , 1999, Nature.

[8]  E. Garfield,et al.  Citation indexes for science. , 1956, Science.

[9]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[10]  S. Brody,et al.  Impact factor as the best operational measure of medical journals , 1995, The Lancet.

[11]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[12]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[13]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[14]  Djoerd Hiemstra,et al.  Bayesian extension to the language model for ad hoc information retrieval , 2003, SIGIR.

[15]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[16]  Xiaohua Hu,et al.  Semantic text mining and its application in biomedical domain , 2006 .

[17]  Leif Azzopardi,et al.  Age Dependent Document Priors in Link Structure Analysis , 2005, ECIR.

[18]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[19]  Luo Si,et al.  York University at TREC 2007: Genomics Track , 2005, TREC.

[20]  T. Opthof,et al.  Sense and nonsense about the impact factor. , 1997, Cardiovascular research.

[21]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[22]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[23]  Sergey N. Dorogovtsev,et al.  Accelerated growth of networks , 2002, ArXiv.