Prioritizing relevance judgments to improve the construction of IR test collections
暂无分享,去创建一个
Ingemar J. Cox | Mehdi Hosseini | Natasa Milic-Frayling | Vishwa Vinay | Trevor Sweeting | I. Cox | Natasa Milic-Frayling | M. Hosseini | T. Sweeting | Vishwa Vinay
[1] B. Ripley,et al. Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.
[2] Stephen E. Robertson,et al. A few good topics: Experiments in topic set reduction for retrieval evaluation , 2009, TOIS.
[3] Mark Sanderson,et al. Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.
[4] Ben Carterette,et al. Hypothesis testing with incomplete relevance judgments , 2007, CIKM '07.
[5] Richard W. Cottle,et al. Linear Complementarity Problem , 2009, Encyclopedia of Optimization.
[6] Ben Carterette,et al. Measuring the reusability of test collections , 2010, WSDM '10.
[7] Pietro Perona,et al. The Multidimensional Wisdom of Crowds , 2010, NIPS.
[8] Emine Yilmaz,et al. A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.
[9] Emine Yilmaz,et al. Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.
[10] Kuldeep Kumar,et al. Robust Statistics, 2nd edn , 2011 .
[11] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[12] Laurence Anthony F. Park,et al. Score adjustment for correction of pooling bias , 2009, SIGIR.
[13] Charles L. A. Clarke,et al. Efficient construction of large test collections , 1998, SIGIR '98.
[14] Emine Yilmaz,et al. A statistical method for system evaluation using incomplete judgments , 2006, SIGIR.
[15] K. Sparck Jones,et al. INFORMATION RETRIEVAL TEST COLLECTIONS , 1976 .
[16] Justin Zobel,et al. How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.
[17] Duncan J. Watts,et al. Financial incentives and the "performance of crowds" , 2009, HCOMP '09.
[18] Ingemar J. Cox,et al. Selecting a Subset of Queries for Acquisition of Further Relevance Judgements , 2011, ICTIR.
[19] I. Stancu-Minasian. Nonlinear Fractional Programming , 1997 .
[20] Pietro Perona,et al. Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.
[21] James Allan,et al. Minimal test collections for retrieval evaluation , 2006, SIGIR.
[22] Ben Carterette,et al. Reusable test collections through experimental design , 2010, SIGIR.
[23] Stephen E. Robertson,et al. On the Contributions of Topics to System Evaluation , 2011, ECIR.
[24] Stefanie Nowak,et al. How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.