An analysis of human factors and label accuracy in crowdsourcing relevance judgments
暂无分享,去创建一个
[1] John Le,et al. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .
[2] Ben Carterette,et al. The effect of assessor error on IR system evaluation , 2010, SIGIR.
[3] Gabriella Kazai,et al. Worker types and personality traits in crowdsourcing relevance labels , 2011, CIKM '11.
[4] Ben Carterette,et al. An Analysis of Assessor Behavior in Crowdsourced Preference Judgments , 2010 .
[5] Aniket Kittur,et al. Crowdsourcing user studies with Mechanical Turk , 2008, CHI.
[6] Lorrie Faith Cranor,et al. Are your participants gaming the system?: screening mechanical turk workers , 2010, CHI.
[7] Dana Chandler,et al. Preventing Satisficing in Online Surveys: A "Kapcha" to Ensure Higher Quality Data , 2010 .
[8] Daniel M. Russell,et al. Query logs alone are not enough , 2007 .
[9] and software — performance evaluation , .
[10] Stefanie Nowak,et al. How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.
[11] Brian A Vander Schee. Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business , 2009 .
[12] Matthew Lease,et al. On Quality Control and Machine Learning in Crowdsourcing , 2011, Human Computation.
[13] Aniket Kittur,et al. Instrumenting the crowd: using implicit behavioral measures to predict task performance , 2011, UIST.
[14] Phuoc Tran-Gia,et al. Anatomy of a Crowdsourcing Platform - Using the Example of Microworkers.com , 2011, 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.
[15] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[16] Siddharth Suri,et al. Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.
[17] Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..
[18] Omar Alonso,et al. Crowdsourcing for relevance evaluation , 2008, SIGF.
[19] Bill Tomlinson,et al. Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.
[20] Ray C. Fair,et al. Principles of economics , 1993 .
[21] Filip Radlinski,et al. How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.
[22] L. Festinger,et al. Cognitive consequences of forced compliance. , 2011, Journal of abnormal psychology.
[23] Tara S. Behrend,et al. The viability of crowdsourcing for survey research , 2011, Behavior research methods.
[24] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .
[25] David C. Parkes,et al. The role of game theory in human computation systems , 2009, HCOMP '09.
[26] Panagiotis G. Ipeirotis,et al. Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.
[27] Cyril Cleverdon,et al. The Cranfield tests on index language devices , 1997 .
[28] Matthew Lease,et al. Crowdsourcing for search evaluation , 2011, SIGF.
[29] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.
[30] Omar Alonso,et al. Crowdsourcing Assessments for XML Ranked Retrieval , 2010, ECIR.
[31] Ricardo Baeza-Yates,et al. Design and Implementation of Relevance Assessments Using Crowdsourcing , 2011, ECIR.
[32] Gabriella Kazai,et al. Towards methods for the collective gathering and quality control of relevance assessments , 2009, SIGIR.
[33] Jeroen B. P. Vuurens,et al. How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .
[34] Aaron D. Shaw,et al. Designing incentives for inexpert human raters , 2011, CSCW.
[35] Peter Bailey,et al. Relevance assessment: are judges exchangeable and does it matter , 2008, SIGIR '08.
[36] Charles L. A. Clarke,et al. Efficient construction of large test collections , 1998, SIGIR '98.
[37] Gabriella Kazai,et al. Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.
[38] Bill Tomlinson,et al. Sellers' problems in human computation markets , 2010, HCOMP '10.
[39] Gjergji Kasneci,et al. Bayesian Knowledge Corroboration with Logical Rules and User Feedback , 2010, ECML/PKDD.
[40] Mark E. J. Newman,et al. Power-Law Distributions in Empirical Data , 2007, SIAM Rev..
[41] Elizabeth F. Churchill,et al. Logging the Search Self-Efficacy of Amazon Mechanical Turkers , 2010 .
[42] Gabriella Kazai,et al. Overview of the INEX 2008 Book Track , 2009, INEX.
[43] Panagiotis G. Ipeirotis. Demographics of Mechanical Turk , 2010 .
[44] Benjamin B. Bederson,et al. Human computation: a survey and taxonomy of a growing field , 2011, CHI.
[45] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.
[46] Ellen M. Voorhees,et al. TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .
[47] Panagiotis G. Ipeirotis. Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.
[48] Alon Y. Halevy,et al. Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.
[49] Gabriella Kazai,et al. Overview of the TREC 2012 Crowdsourcing Track , 2012, TREC.
[50] Hajo Hippner,et al. Crowdsourcing , 2012, Business & Information Systems Engineering.
[51] Matthew Lease,et al. Crowdsourcing Document Relevance Assessment with Mechanical Turk , 2010, Mturk@HLT-NAACL.
[52] A. N. Oppenheim. Questionnaire Design and Attitude Measurement , 1966 .
[53] Duncan J. Watts,et al. Financial incentives and the "performance of crowds" , 2009, HCOMP '09.
[54] Andrew Trotman,et al. Comparative analysis of clicks and judgments for IR evaluation , 2009, WSCD '09.
[55] Pietro Perona,et al. The Multidimensional Wisdom of Crowds , 2010, NIPS.
[56] A. P. deVries,et al. How Crowdsourcable is Your Task , 2011 .
[57] Gabriella Kazai,et al. In Search of Quality in Crowdsourcing for Search Engine Evaluation , 2011, ECIR.