Gauging the Quality of Relevance Assessments using Inter-Rater Agreement
暂无分享,去创建一个
[1] A. Viera,et al. Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.
[2] Gabriella Kazai,et al. An analysis of human factors and label accuracy in crowdsourcing relevance judgments , 2013, Information Retrieval.
[3] J. Shane Culpepper,et al. The Influence of Topic Difficulty, Relevance Level, and Document Ordering on Relevance Judging , 2016, ADCS.
[4] Paul Solomon,et al. Toward an Understanding of the Dynamics of Relevance Judgment: An Analysis of One Person's Search Behavior , 1998, Inf. Process. Manag..
[5] Ellen M. Voorhees,et al. Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.
[6] Falk Scholer,et al. Metric and Relevance Mismatch in Retrieval Evaluation , 2009, AIRS.
[7] Joseph L. Fleiss,et al. Balanced Incomplete Block Designs for Inter-Rater Reliability Studies , 1981 .
[8] Eero Sormunen,et al. Liberal relevance criteria of TREC -: counting on negligible documents? , 2002, SIGIR '02.
[9] Mark Sanderson,et al. Relevance judgments between TREC and Non-TREC assessors , 2008, SIGIR '08.
[10] Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..
[11] J. Shane Culpepper,et al. The Effect of Document Order and Topic Difficulty on Assessor Agreement , 2016, ICTIR.
[12] T. Saracevic,et al. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance , 2007, J. Assoc. Inf. Sci. Technol..
[13] Stefano Mizzaro. Relevance: the whole history , 1997 .
[14] Mark Sanderson,et al. Quantifying test collection quality based on the consistency of relevance judgements , 2011, SIGIR.
[15] Klaus Krippendorff,et al. Answering the Call for a Standard Reliability Measure for Coding Data , 2007 .
[16] J. Shane Culpepper,et al. The effect of pooling and evaluation depth on IR metrics , 2016, Information Retrieval Journal.
[17] Jianqiang Wang. Accuracy , Agreement , Speed , and Perceived Difficulty of Users ’ Relevance Judgments for E-Discovery , 2011 .
[18] Omar Alonso,et al. Using crowdsourcing for TREC relevance assessment , 2012, Inf. Process. Manag..
[19] Peter Bailey,et al. Relevance assessment: are judges exchangeable and does it matter , 2008, SIGIR '08.