Creation of Reliable Relevance Judgments in Information Retrieval Systems Evaluation Experimentation through Crowdsourcing: A Review
暂无分享,去创建一个
[1] Mohammad Soleymani,et al. Crowdsourcing for Affective Annotation of Video: Development of a Viewer-reported Boredom Corpus , 2010 .
[2] Gianluca Demartini,et al. Mechanical Cheat: Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms , 2012, CrowdSearch.
[3] Gerardo Hermosillo,et al. Learning From Crowds , 2010, J. Mach. Learn. Res..
[4] Chuang Zhang,et al. Real-time quality control for crowdsourcing relevance evaluation , 2012, 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content.
[5] Ben Carterette,et al. An Analysis of Assessor Behavior in Crowdsourced Preference Judgments , 2010 .
[6] Gabriella Kazai,et al. Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.
[7] Mark D. Smucker,et al. The Crowd vs . the Lab : A Comparison of Crowd-Sourced and University Laboratory Participant Behavior , 2011 .
[8] Ricardo Baeza-Yates,et al. Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .
[9] A. P. Dawid,et al. Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .
[10] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .
[11] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[12] Aniket Kittur,et al. Crowdsourcing user studies with Mechanical Turk , 2008, CHI.
[13] Siddharth Suri,et al. Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.
[14] Omar Alonso,et al. Crowdsourcing for relevance evaluation , 2008, SIGF.
[15] Kathryn T. Stolee,et al. Exploring the use of crowdsourcing to support empirical studies in software engineering , 2010, ESEM '10.
[16] Bill Tomlinson,et al. Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.
[17] Victor Kuperman,et al. Crowdsourcing and language studies: the new generation of linguistic data , 2010, Mturk@HLT-NAACL.
[18] Omar Alonso,et al. Using crowdsourcing for TREC relevance assessment , 2012, Inf. Process. Manag..
[19] Jiayu Tang,et al. Examining the Limits of Crowdsourcing for Relevance Assessment , 2013, IEEE Internet Computing.
[20] Rose Holley. Crowdsourcing and social engagement: potential, power and freedom for libraries and users. , 2009 .
[21] Qinghua Zhu,et al. Evaluation on crowdsourcing research: Current status and future direction , 2012, Information Systems Frontiers.
[22] Panagiotis G. Ipeirotis,et al. Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.
[23] Ingemar J. Cox,et al. On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents , 2012, ECIR.
[24] Matthew Lease,et al. Crowdsourcing Document Relevance Assessment with Mechanical Turk , 2010, Mturk@HLT-NAACL.
[25] Jeffrey Heer,et al. Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.
[26] Matthew Lease,et al. Semi-Supervised Consensus Labeling for Crowdsourcing , 2011 .
[27] Cyril Cleverdon,et al. The Cranfield tests on index language devices , 1997 .
[28] Panagiotis G. Ipeirotis,et al. Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.
[29] Aniket Kittur,et al. CrowdForge: crowdsourcing complex work , 2011, UIST.
[30] Duncan J. Watts,et al. Financial incentives and the "performance of crowds" , 2009, HCOMP '09.
[31] Gabriella Kazai,et al. An analysis of human factors and label accuracy in crowdsourcing relevance judgments , 2013, Information Retrieval.
[32] Jenny Chen,et al. Opportunities for Crowdsourcing Research on Amazon Mechanical Turk , 2011 .
[33] Ricardo Baeza-Yates,et al. Design and Implementation of Relevance Assessments Using Crowdsourcing , 2011, ECIR.
[34] Panagiotis G. Ipeirotis. Demographics of Mechanical Turk , 2010 .
[35] Benjamin B. Bederson,et al. Human computation: a survey and taxonomy of a growing field , 2011, CHI.
[36] Omar Alonso,et al. Implementing crowdsourcing-based relevance experimentation: an industrial perspective , 2013, Information Retrieval.
[37] Chris Callison-Burch,et al. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.
[38] Matthew Lease,et al. Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization , 2012, HCOMP@AAAI.
[39] Benno Stein,et al. An Evaluation Framework for Plagiarism Detection , 2010, COLING.
[40] Gabriella Kazai,et al. In Search of Quality in Crowdsourcing for Search Engine Evaluation , 2011, ECIR.
[41] Arjen P. de Vries,et al. Increasing cheat robustness of crowdsourcing tasks , 2013, Information Retrieval.
[42] Derek Greene,et al. Using Crowdsourcing and Active Learning to Track Sentiment in Online Media , 2010, ECAI.
[43] Pietro Perona,et al. Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.
[44] Hajo Hippner,et al. Crowdsourcing , 2012, Business & Information Systems Engineering.
[45] Iadh Ounis,et al. Crowdsourcing a News Query Classification Dataset , 2010 .
[46] Tobias Hoßfeld,et al. Analyzing costs and accuracy of validation mechanisms for crowdsourcing platforms , 2013, Math. Comput. Model..
[47] Jaime G. Carbonell,et al. Active Learning and Crowd-Sourcing for Machine Translation , 2010, LREC.
[48] Daren C. Brabham. Crowdsourcing the Public Participation Process for Planning Projects , 2009 .
[49] Luca de Alfaro,et al. Reputation systems for open collaboration , 2011, Commun. ACM.
[50] Ruslan Salakhutdinov,et al. Probabilistic Matrix Factorization , 2007, NIPS.
[51] Scott R. Klemmer,et al. Shepherding the crowd: managing and providing feedback to crowd workers , 2011, CHI Extended Abstracts.
[52] Zihui Ge,et al. Crowdsourcing service-level network event monitoring , 2010, SIGCOMM '10.
[53] Sri Devi Ravana,et al. Low-cost evaluation techniques for information retrieval systems: A review , 2013, J. Informetrics.
[54] Chrysanthos Dellarocas,et al. Harnessing Crowds: Mapping the Genome of Collective Intelligence , 2009 .
[55] Klaus Krippendorff,et al. Estimating the Reliability, Systematic Error and Random Error of Interval Data , 1970 .
[56] John Le,et al. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .
[57] Gabriella Kazai,et al. Worker types and personality traits in crowdsourcing relevance labels , 2011, CIKM '11.
[58] Panagiotis G. Ipeirotis,et al. Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.
[59] James Davis,et al. Evaluating and improving the usability of Mechanical Turk for low-income workers in India , 2010, ACM DEV '10.
[60] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .
[61] Elisa Bertino,et al. Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.
[62] Panagiotis G. Ipeirotis,et al. Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.
[63] Matthew Lease,et al. Inferring missing relevance judgments from crowd workers via probabilistic matrix factorization , 2012, SIGIR '12.
[64] Inc. Alias-i. Multilevel Bayesian Models of Categorical Data Annotation , 2008 .
[65] Emine Yilmaz,et al. Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems , 2012, Information Retrieval.
[66] John Langford,et al. CAPTCHA: Using Hard AI Problems for Security , 2003, EUROCRYPT.
[67] Giuseppe Piro,et al. HetNets Powered by Renewable Energy Sources: Sustainable Next-Generation Cellular Networks , 2013, IEEE Internet Computing.
[68] Eli Blevis,et al. A survey of crowdsourcing as a means of collaboration and the implications of crowdsourcing for interaction design , 2011, 2011 International Conference on Collaboration Technologies and Systems (CTS).
[69] Gabriella Kazai,et al. The face of quality in crowdsourcing relevance labels: demographics, personality and labeling accuracy , 2012, CIKM.
[70] Jeroen B. P. Vuurens,et al. How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .
[71] Manuel Blum,et al. reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.
[72] Arjen P. de Vries,et al. Obtaining High-Quality Relevance Judgments Using Crowdsourcing , 2012, IEEE Internet Computing.
[73] Björn Hartmann,et al. Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.
[74] Schahram Dustdar,et al. Modeling Rewards and Incentive Mechanisms for Social BPM , 2012, BPM.
[75] Ellen M. Voorhees,et al. The Philosophy of Information Retrieval Evaluation , 2001, CLEF.