Evaluating Relevance Judgments with Pairwise Discriminative Power
暂无分享,去创建一个
Yiqun Liu | Tetsuya Sakai | Zhumin Chu | Min Zhang | Fan Zhang | Jiaxin Mao | Shaoping Ma | M. Zhang | Yiqun Liu | Shaoping Ma | T. Sakai | Zhumin Chu | Jiaxin Mao | Fan Zhang
[1] M. D. Rijke,et al. Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of CLEF , 2019, Information Retrieval Evaluation in a Changing World.
[2] Tetsuya Sakai,et al. Good Evaluation Measures based on Document Preferences , 2020, SIGIR.
[3] Olivier Chapelle,et al. Expected reciprocal rank for graded relevance , 2009, CIKM.
[4] Mark D. Smucker,et al. Offline Evaluation by Maximum Similarity to an Ideal Ranking , 2020, CIKM.
[5] Ben Carterette,et al. Using preference judgments for novel document retrieval , 2012, SIGIR '12.
[6] Stefano Mizzaro,et al. A Formal Account of Effectiveness Evaluation and Ranking Fusion , 2018, ICTIR.
[7] Tetsuya Sakai,et al. Evaluating evaluation metrics based on the bootstrap , 2006, SIGIR.
[8] D. Harman,et al. TREC: Experiment and Evaluation in Information Retrieval , 2006 .
[9] Cheng Luo,et al. Overview of the NTCIR-13 We Want Web Task , 2017, NTCIR.
[10] Ben Carterette,et al. An Analysis of Assessor Behavior in Crowdsourced Preference Judgments , 2010 .
[11] M. Rosenblatt. A CENTRAL LIMIT THEOREM AND A STRONG MIXING CONDITION. , 1956, Proceedings of the National Academy of Sciences of the United States of America.
[12] David Maxwell Chickering,et al. Here or There , 2008, ECIR.
[13] Haldun Akoglu,et al. User's guide to correlation coefficients , 2018, Turkish journal of emergency medicine.
[14] Charles L. A. Clarke,et al. Overview of the TREC 2004 Terabyte Track , 2004, TREC.
[15] Tetsuya Sakai,et al. Evaluating Information Retrieval and Access Tasks: NTCIR's Legacy of Research Impact , 2021 .
[16] B. Everitt,et al. Large sample standard errors of kappa and weighted kappa. , 1969 .
[17] Eddy Maddalena,et al. On Transforming Relevance Scales , 2019, CIKM.
[18] Ellen M. Voorhees,et al. TREC 2014 Web Track Overview , 2015, TREC.
[19] Cyril W. Cleverdon,et al. Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices , 1966 .
[20] Eddy Maddalena,et al. Let's Agree to Disagree: Fixing Agreement Measures for Crowdsourcing , 2017, HCOMP.
[21] José Luis Vicedo González,et al. TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..
[22] Falk Scholer,et al. The Benefits of Magnitude Estimation Relevance Assessments for Information Retrieval Evaluation , 2015, SIGIR.
[23] Eero Sormunen,et al. Liberal relevance criteria of TREC -: counting on negligible documents? , 2002, SIGIR '02.
[24] Alistair Moffat,et al. Pairwise Crowd Judgments: Preference, Absolute, and Ratio , 2018, ADCS.
[25] Rong Tang,et al. Towards the Identification of the Optimal Number of Relevance Categories , 1999, J. Am. Soc. Inf. Sci..
[26] A. Viera,et al. Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.
[27] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[28] Peter Bailey,et al. Relevance assessment: are judges exchangeable and does it matter , 2008, SIGIR '08.
[29] Eddy Maddalena,et al. On Fine-Grained Relevance Scales , 2018, SIGIR.
[30] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.
[31] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..
[32] John Guiver,et al. Bayesian inference for Plackett-Luce ranking models , 2009, ICML '09.
[33] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.
[34] Klaus Krippendorff,et al. Computing Krippendorff's Alpha-Reliability , 2011 .
[35] Zhicheng Dou,et al. Overview of the NTCIR-15 We Want Web with CENTRE (WWW-3) Task , 2020 .
[36] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[37] Ben Carterette,et al. Preference based evaluation measures for novelty and diversity , 2013, SIGIR.
[38] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .
[39] Alistair Moffat,et al. Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.
[40] Ellen M. Voorhees,et al. Overview of TREC 2003 , 2003, TREC.
[41] Charles L. A. Clarke,et al. Offline Evaluation without Gain , 2020, ICTIR.
[42] Tetsuya Sakai,et al. Evaluating diversified search results using per-intent graded relevance , 2011, SIGIR.