Studying Topical Relevance with Evidence-based Crowdsourcing
暂无分享,去创建一个
Lora Aroyo | Elena Paslaru Bontas Simperl | Evangelos Kanoulas | Dan Li | Giannis Haralabopoulos | Oana Inel | Zoltán Szlávik | Christophe Van Gysel
[1] Falk Scholer,et al. On Crowdsourcing Relevance Magnitudes for Information Retrieval Evaluation , 2017, ACM Trans. Inf. Syst..
[2] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.
[3] Matthew Lease,et al. Crowdsourcing Document Relevance Assessment with Mechanical Turk , 2010, Mturk@HLT-NAACL.
[4] Lora Aroyo,et al. Crowdsourcing Ground Truth for Medical Relation Extraction , 2017, ACM Trans. Interact. Intell. Syst..
[5] J. Shane Culpepper,et al. Gauging the Quality of Relevance Assessments using Inter-Rater Agreement , 2017, SIGIR.
[6] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[7] Rong Tang,et al. Towards the Identification of the Optimal Number of Relevance Categories , 1999, J. Am. Soc. Inf. Sci..
[8] A. Trotman. Can we at least agree on something ? , 2007 .
[9] Stephen E. Robertson,et al. A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.
[10] Q. Mcnemar. Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.
[11] Lora Aroyo,et al. The Three Sides of CrowdTruth , 2014, Hum. Comput..
[12] Jiayu Tang,et al. Examining the Limits of Crowdsourcing for Relevance Assessment , 2013, IEEE Internet Computing.
[13] Lora Aroyo,et al. Empirical Methodology for Crowdsourcing Ground Truth , 2018, Semantic Web.
[14] J. Knowlton. On the definition of “Picture” , 1966 .
[15] Lora Aroyo,et al. CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement (short paper) , 2018, SAD/CrowdBias@HCOMP.
[16] James Allan,et al. TREC 2017 Common Core Track Overview , 2017, TREC.
[17] Omar Alonso,et al. Using crowdsourcing for TREC relevance assessment , 2012, Inf. Process. Manag..
[18] Rong Tang,et al. Towards the Identification of the Optimal Number of Relevance Categories , 1999, J. Am. Soc. Inf. Sci..
[19] Ricardo Baeza-Yates,et al. Design and Implementation of Relevance Assessments Using Crowdsourcing , 2011, ECIR.
[20] and software — performance evaluation , .
[21] Alberto Barrón-Cedeño,et al. On the Use of an Intermediate Class in Boolean Crowdsourced Relevance Annotations for Learning to Rank Comments , 2017, SIGIR.
[22] Peter Bailey,et al. Relevance assessment: are judges exchangeable and does it matter , 2008, SIGIR '08.
[23] Ellen M. Voorhees,et al. The Philosophy of Information Retrieval Evaluation , 2001, CLEF.
[24] Mark Sanderson,et al. Relevance judgments between TREC and Non-TREC assessors , 2008, SIGIR '08.
[25] Ellen M. Voorhees,et al. Overview of TREC 2001 , 2001, TREC.
[26] Sri Devi Ravana,et al. Low-cost evaluation techniques for information retrieval systems: A review , 2013, J. Informetrics.
[27] Matthew Lease,et al. Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments , 2016, HCOMP.
[28] Eero Sormunen,et al. Liberal relevance criteria of TREC -: counting on negligible documents? , 2002, SIGIR '02.
[29] Ben Carterette,et al. Robust test collections for retrieval evaluation , 2007, SIGIR.
[30] James P. Callan,et al. Passage-level evidence in document retrieval , 1994, SIGIR '94.
[31] Ellen M. Voorhees,et al. Overview of the TREC 2004 Robust Retrieval Track , 2004 .
[32] Mark Sanderson,et al. Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..
[33] Ryen W. White,et al. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes , 2002, SIGIR '02.
[34] Chris Welty,et al. Crowd Truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard , 2013 .
[35] Ellen M. Voorhees,et al. TREC 2014 Web Track Overview , 2015, TREC.
[36] Ellen M. Voorhees,et al. Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.