Unbiased Comparative Evaluation of Ranking Functions
暂无分享,去创建一个
Thorsten Joachims | Peter I. Frazier | Adith Swaminathan | Tobias Schnabel | P. Frazier | T. Joachims | Adith Swaminathan | Tobias Schnabel
[1] Ingemar J. Cox,et al. On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents , 2012, ECIR.
[2] Ellen M. Voorhees,et al. Retrieval evaluation with incomplete information , 2004, SIGIR '04.
[3] Noriko Kando,et al. On information retrieval metrics designed for evaluation with incomplete relevance assessments , 2008, Information Retrieval.
[4] David E. Losada,et al. Feeling lucky?: multi-armed bandits for ordering judgements in pooling-based evaluation , 2016, SAC.
[5] J. Aslam,et al. A Practical Sampling Strategy for Efficient Retrieval Evaluation , 2007 .
[6] Emine Yilmaz,et al. A statistical method for system evaluation using incomplete judgments , 2006, SIGIR.
[7] Justin Zobel,et al. How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.
[8] Emine Yilmaz,et al. Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.
[9] James Allan,et al. If I Had a Million Queries , 2009, ECIR.
[10] Ellen M. Voorhees,et al. The seventh text REtrieval conference (TREC-7) , 1999 .
[11] D. Doermann,et al. Combining preference and absolute judgements in a crowd-sourced setting , 2013 .
[12] James Allan,et al. Minimal test collections for retrieval evaluation , 2006, SIGIR.
[13] Stephen E. Robertson,et al. On per-topic variance in IR evaluation , 2012, SIGIR '12.
[14] Allan Hanbury,et al. The Curious Incidence of Bias Corrections in the Pool , 2016, ECIR.
[15] H. Kahn,et al. Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..
[16] Changhe Yuan,et al. How Heavy Should the Tails Be? , 2005, FLAIRS.
[17] Thorsten Joachims,et al. Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.
[18] Emine Yilmaz,et al. A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.
[19] Alistair Moffat,et al. Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.
[20] Cyril W. Cleverdon,et al. The significance of the Cranfield tests on index languages , 1991, SIGIR '91.
[21] Alistair Moffat,et al. Strategic system comparisons via targeted relevance judgments , 2007, SIGIR.
[22] Tong Zhang,et al. Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.
[23] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[24] Charles L. A. Clarke,et al. Reliable information retrieval evaluation with incomplete and biased judgements , 2007, SIGIR.
[25] C. J. van Rijsbergen,et al. Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .
[26] Gordon V. Cormack,et al. Statistical precision of information retrieval evaluation , 2006, SIGIR.
[27] Stephen E. Robertson,et al. A few good topics: Experiments in topic set reduction for retrieval evaluation , 2009, TOIS.
[28] Per Ahlgren,et al. Retrieval evaluation with incomplete relevance data: a comparative study of three measures , 2006, CIKM '06.
[29] Mark Sanderson,et al. Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..
[30] Laurence Anthony F. Park,et al. Score adjustment for correction of pooling bias , 2009, SIGIR.