Retrieval Models Versus Retrievability

Retrievability is an important measure in information retrieval (IR) that can be used to analyse retrieval models and document collections. Rather than just focusing on a set of few documents that are given in the form of relevance judgements, retrievability examines what is retrieved, how frequently it is retrieved and how much effort is needed to retrieve it. Such a measure is of particular interest within the recall-oriented retrieval systems (e.g. patent or legal retrieval), because in this context a document needs to be retrieved before it can be judged for relevance. If a retrieval model makes some patents hard to find, patent searchers could miss relevant documents just because of the bias of the retrieval model. In this chapter we explain the concept of retrievability in information retrieval. We also explain how it can be estimated and how it can be used for analysing a retrieval bias of retrieval models. We also show how retrievability relates to effectiveness by analysing the relationship between retrievability and effectiveness measures and how the retrievability measure can be used to improve effectiveness.

[1]  Stephen P. Harter,et al.  Evaluation of information retrieval systems : Approaches, issues, and methods , 1997 .

[2]  Peng Li,et al.  Using Global Statistics to Rank Retrieval Systems without Relevance Judgments , 2010, Intelligent Information Processing.

[3]  Anselm Spoerri,et al.  Using the structure of overlap between search results to rank retrieval systems without relevance judgments , 2007, Inf. Process. Manag..

[4]  Leif Azzopardi,et al.  Retrievability: an evaluation measure for higher order information access tasks , 2008, CIKM '08.

[5]  Richard Bache,et al.  On the relationship between effectiveness and accessibility , 2010, SIGIR '10.

[6]  J. M. Morris,et al.  Accessibility indicators for transport planning , 1979 .

[7]  Ke Wang,et al.  Bias and controversy: beyond the statistical deviation , 2006, KDD '06.

[8]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[9]  Avi Arampatzis,et al.  Access to Legal Documents: Exact Match, Best Match, and Combinations , 2007, TREC.

[10]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[11]  Miles Efron,et al.  Using Multiple Query Aspects to Build Test Collections without Human Relevance Judgments , 2009, ECIR.

[12]  Djoerd Hiemstra,et al.  A Case for Automatic System Evaluation , 2010, ECIR.

[13]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[14]  Tetsuya Sakai,et al.  Ranking Retrieval Systems without Relevance Assessments: Revisited , 2010, EVIA@NTCIR.

[15]  Ingemar J. Cox,et al.  The web structure of e-government - developing a methodology for quantitative evaluation , 2006, WWW '06.

[16]  Rabia Nuray-Turan,et al.  Automatic ranking of information retrieval systems using data fusion , 2006, Inf. Process. Manag..

[17]  Makoto Iwayama,et al.  Proposal of two-stage patent retrieval method considering the claim structure , 2005, TALIP.

[18]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[19]  J. Gastwirth The Estimation of the Lorenz Curve and Gini Index , 1972 .

[20]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[21]  Andreas Rauber,et al.  Automatic ranking of retrieval models using retrievability measure , 2014, Knowledge and Information Systems.

[22]  Mark Sanderson,et al.  Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.

[23]  Leif Azzopardi,et al.  A Retrievability Analysis: Exploring the Relationship Between Retrieval Bias and Retrieval Performance , 2014, CIKM.

[24]  Abbe Mowshowitz,et al.  Bias on the web , 2002, CACM.

[25]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[26]  Walid Magdy,et al.  PRES: a score metric for evaluating recall-oriented information retrieval applications , 2010, SIGIR.

[27]  Leif Azzopardi,et al.  Accessibility in Information Retrieval , 2008, ECIR.

[28]  ChengXiang Zhai Risk Minimization and Language Modeling in Text Retrieval – Thesis Summary , 2002 .

[29]  Ali Shiri Introduction to Modern Information Retrieval (2nd edition) , 2004 .

[30]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[31]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[32]  Xiangji Huang,et al.  TREC-CHEM: large scale chemical information retrieval evaluation at TREC , 2009, SIGF.

[33]  W. G. Hansen How Accessibility Shapes Land Use , 1959 .

[34]  Richard Bache,et al.  Improving Access to Large Patent Corpora , 2010, Trans. Large Scale Data Knowl. Centered Syst..

[35]  Leif Azzopardi,et al.  Search engine predilection towards news media providers , 2009, SIGIR.

[36]  Bert van Wee,et al.  Accessibility evaluation of land-use and transport strategies: review and research directions , 2004 .

[37]  Mike Thelwall,et al.  Search engine coverage bias: evidence and possible causes , 2004, Inf. Process. Manag..

[38]  Peng Li,et al.  Using Clustering to Improve Retrieval Evaluation without Relevance Judgments , 2010, COLING.

[39]  T. Litman Evaluating Accessibility for Transportation Planning , 2007 .

[40]  Noriko Kando,et al.  Introduction to the special issue on patent processing , 2007, Inf. Process. Manag..

[41]  Amit Singhal AT&T at TREC-6 , 1997, TREC.