Can Deep Effectiveness Metrics Be Evaluated Using Shallow Judgment Pools?
暂无分享,去创建一个
[1] Thorsten Joachims,et al. Unbiased Comparative Evaluation of Ranking Functions , 2016, ICTIR.
[2] Tetsuya Sakai,et al. Alternatives to Bpref , 2007, SIGIR.
[3] Allan Hanbury,et al. The Solitude of Relevant Documents in the Pool , 2016, CIKM.
[4] Emine Yilmaz,et al. Estimating average precision when judgments are incomplete , 2007, Knowledge and Information Systems.
[5] Ellen M. Voorhees,et al. Bias and the limits of pooling for large collections , 2007, Information Retrieval.
[6] Justin Zobel,et al. How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.
[7] Emine Yilmaz,et al. A statistical method for system evaluation using incomplete judgments , 2006, SIGIR.
[8] J. Shane Culpepper,et al. Improving test collection pools with machine learning , 2014, ADCS.
[9] Tetsuya Sakai. Comparing metrics across TREC and NTCIR: the robustness to system bias , 2008, CIKM '08.
[10] Alistair Moffat,et al. Users versus models: what observation tells us about effectiveness metrics , 2013, CIKM.
[11] Olivier Chapelle,et al. Expected reciprocal rank for graded relevance , 2009, CIKM.
[12] Alistair Moffat,et al. Score Estimation, Incomplete Judgments, and Significance Testing in IR Evaluation , 2010, AIRS.
[13] Stephen Robertson,et al. ON DOCUMENT POPULATIONS AND MEASURES OF IR EFFECTIVENESS , 2007 .
[14] Tetsuya Sakai. Comparing metrics across TREC and NTCIR:: the robustness to pool depth bias , 2008, SIGIR '08.
[15] Dirk Van,et al. Ensemble Methods: Foundations and Algorithms , 2012 .
[16] Thorsten Joachims,et al. Unbiased Ranking Evaluation on a Budget , 2015, WWW.
[17] Emine Yilmaz,et al. A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.
[18] Allan Hanbury,et al. Splitting Water: Precision and Anti-Precision to Reduce Pool Bias , 2015, SIGIR.
[19] José Luis Vicedo González,et al. TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..
[20] A. Chao,et al. Estimating the Number of Classes via Sample Coverage , 1992 .
[21] Nicola Ferro,et al. Bridging Between Information Retrieval and Databases , 2014, Lecture Notes in Computer Science.
[22] Ellen M. Voorhees,et al. The effect of sampling strategy on inferred measures , 2014, SIGIR.
[23] Charles L. A. Clarke,et al. Reliable information retrieval evaluation with incomplete and biased judgements , 2007, SIGIR.
[24] Ellen M. Voorhees,et al. Retrieval evaluation with incomplete information , 2004, SIGIR '04.
[25] J. Shane Culpepper,et al. The effect of pooling and evaluation depth on IR metrics , 2016, Information Retrieval Journal.
[26] Laurence Anthony F. Park,et al. Score adjustment for correction of pooling bias , 2009, SIGIR.
[27] Alistair Moffat,et al. Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.
[28] Alistair Moffat,et al. Strategic system comparisons via targeted relevance judgments , 2007, SIGIR.
[29] Tetsuya Sakai,et al. Metrics, Statistics, Tests , 2013, PROMISE Winter School.
[30] J. Shane Culpepper,et al. Modeling Relevance as a Function of Retrieval Rank , 2016, AIRS.
[31] Sergei Vassilvitskii,et al. Generalized distances between rankings , 2010, WWW '10.