论文信息 - Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research

Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research

This survey provides a comprehensive overview of the landscape of crowdsourcing research, targeted at the machine learning community. We begin with an overview of the ways in which crowdsourcing can be used to advance machine learning research, focusing on four application areas: 1) data generation, 2) evaluation and debugging of models, 3) hybrid intelligence systems that leverage the complementary strengths of humans and machines to expand the capabilities of AI, and 4) crowdsourced behavioral experiments that improve our understanding of how humans interact with machine learning systems and technology more broadly. We next review the extensive literature on the behavior of crowdworkers themselves. This research, which explores the prevalence of dishonesty among crowdworkers, how workers respond to both monetary incentives and intrinsic forms of motivation, and how crowdworkers interact with each other, has immediate implications that we distill into best practices that researchers should follow when using crowdsourcing in their own research. We conclude with a discussion of additional tips and best practices that are crucial to the success of any project that uses crowdsourcing, but rarely mentioned in the literature.

Jennifer Wortman Vaughan

[1] John Le,et al. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .

[2] Vanessa S. Williamson. On the Ethics of Crowdsourced Research , 2016, PS: Political Science & Politics.

[3] Daniel G. Goldstein,et al. Honesty in an Online Labor Market , 2011, Human Computation.

[4] Thomas A. Rietz,et al. Results from a Dozen Years of Election Futures Markets Research , 2008 .

[5] Shuchi Chawla,et al. Optimal crowdsourcing contests , 2019, Games Econ. Behav..

[6] Nilesh N. Dalvi,et al. Crowdsourcing Algorithms for Entity Resolution , 2014, Proc. VLDB Endow..

[7] Aniket Kittur,et al. Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[8] Philip E. Tetlock,et al. Distilling the Wisdom of Crowds: Prediction Markets versus Prediction Polls , 2017 .

[9] Manuel Blum,et al. Verbosity: a game for collecting common-sense facts , 2006, CHI.

[10] Sanja Fidler,et al. Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Berkeley J. Dietvorst,et al. Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err , 2014, Journal of experimental psychology. General.

[12] Aleksandrs Slivkins,et al. Incentivizing High Quality Crowdwork , 2015 .

[13] Andrew D. Selbst,et al. Big Data's Disparate Impact , 2016 .

[14] Jonathan Krause,et al. Scalable Annotation of Fine-Grained Objects Without Experts , 2017 .

[15] Jinfeng Yi,et al. Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach , 2012, HCOMP@AAAI.

[16] Hannes Heikinheimo,et al. The Crowd-Median Algorithm , 2013, HCOMP.

[17] David R. Karger,et al. Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[18] Chris Callison-Burch,et al. Machine Translation of Arabic Dialects , 2012, NAACL.

[19] Subhransu Maji,et al. Similarity Comparisons for Interactive Fine-Grained Categorization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Babak Naderi,et al. Who are the Crowdworkers , 2018 .

[21] Gianluca Demartini,et al. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[22] Sanja Fidler,et al. Human-Machine CRFs for Identifying Bottlenecks in Scene Understanding , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Laura A. Dabbish,et al. Designing games with a purpose , 2008, CACM.

[24] Hao Su,et al. Crowdsourcing Annotations for Visual Object Detection , 2012, HCOMP@AAAI.

[25] Matt Post,et al. Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing , 2012, WMT@NAACL-HLT.

[26] Yuval Peres,et al. Approval Voting and Incentives in Crowdsourcing , 2015, ICML.

[27] David M. Pennock,et al. A Utility Framework for Bounded-Loss Market Makers , 2007, UAI.

[28] Joseph Goodman,et al. Crowdsourcing Consumer Research , 2017 .

[29] Lydia B. Chilton,et al. Cobi: a community-informed conference scheduling tool , 2013, UIST.

[30] Gang Wang,et al. Serf and turf: crowdturfing for fun and profit , 2011, WWW.

[31] D. Shah,et al. Unifying Framework for Crowd-sourcing via Graphon Estimation , 2017 .

[32] Joseph P. Simmons,et al. Overcoming Algorithm Aversion: People Will Use Imperfect Algorithms If They Can (Even Slightly) Modify Them , 2016, Manag. Sci..

[33] Christopher G. Harris. You're Hired! An Examination of Crowdsourcing Incentive Models in Human Resource Tasks , 2011 .

[34] Lydia B. Chilton,et al. Frenzy: collaborative data organization for creating conference sessions , 2014, CHI.

[35] David R. Karger,et al. Attendee-Sourcing: Exploring The Design Space of Community-Informed Conference Scheduling , 2014, HCOMP.

[36] Yashesh Gaur,et al. Using keyword spotting to help humans correct captioning faster , 2015, INTERSPEECH.

[37] Nihar B. Shah,et al. No Oops, You Won't Do It Again: Mechanisms for Self-correction in Crowdsourcing , 2016, ICML.

[38] Pietro Perona,et al. The Multidimensional Wisdom of Crowds , 2010, NIPS.

[39] J. Wolfers,et al. Prediction Markets , 2003 .

[40] Mary L. Gray,et al. The Crowd is a Collaborative Network , 2016, CSCW.

[41] Aleksandrs Slivkins,et al. Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems , 2016, J. Artif. Intell. Res..

[42] Dong Yu,et al. Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[43] Xindong Wu,et al. Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[44] Nebojsa Jojic,et al. Active spectral clustering via iterative uncertainty reduction , 2012, KDD.

[45] Thomas W. Malone,et al. What makes things fun to learn? heuristics for designing instructional computer games , 1980, SIGSMALL '80.

[46] Boi Faltings,et al. Incentives for Effort in Crowdsourcing Using the Peer Truth Serum , 2016, ACM Trans. Intell. Syst. Technol..

[47] Chao Gao,et al. Exact Exponent in Optimal Rates for Crowdsourcing , 2016, ICML.

[48] Arya Mazumdar,et al. Clustering Via Crowdsourcing , 2016, ArXiv.

[49] Serge J. Belongie,et al. Bayesian representation learning with oracle constraints , 2015, ICLR 2016.

[50] Aniket Kittur,et al. CrowdForge: crowdsourcing complex work , 2011, UIST.

[51] H. Sebastian Seung,et al. A solution to the single-question crowd wisdom problem , 2017, Nature.

[52] Robin Hanson,et al. Combinatorial Information Market Design , 2003, Inf. Syst. Frontiers.

[53] Lydia B. Chilton,et al. Community Clustering: Leveraging an Academic Crowd to Form Coherent Conference Sessions , 2013, HCOMP.

[54] U. Fischbacher,et al. Lies in Disguise. An experimental study on cheating , 2013 .

[55] Latanya Sweeney,et al. Discrimination in online ad delivery , 2013, CACM.

[56] Jian Peng,et al. Variational Inference for Crowdsourcing , 2012, NIPS.

[57] Paul Resnick,et al. Eliciting Informative Feedback: The Peer-Prediction Method , 2005, Manag. Sci..

[58] Jesse Chandler,et al. Lie for a Dime , 2017 .

[59] Omar Alonso,et al. Implementing crowdsourcing-based relevance experimentation: an industrial perspective , 2013, Information Retrieval.

[60] C. Chabris,et al. Common (Mis)Beliefs about Memory: A Replication and Comparison of Telephone and Mechanical Turk Survey Methods , 2012, PloS one.

[61] Lyle H. Ungar,et al. The Good Judgment Project: A Large Scale Test of Different Methods of Combining Expert Predictions , 2012, AAAI Fall Symposium: Machine Aggregation of Human Judgment.

[62] Guoliang Li,et al. Crowdsourced Data Management: A Survey , 2016, IEEE Transactions on Knowledge and Data Engineering.

[63] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[64] Panagiotis G. Ipeirotis,et al. Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[65] Knut Reinert,et al. The Practical Use of the A* Algorithm for Exact Multiple Sequence Alignment , 2000, J. Comput. Biol..

[66] Philip E. Tetlock,et al. Bringing probability judgments into policy debates via forecasting tournaments , 2017, Science.

[67] Quentin Pleple,et al. Interactive Topic Modeling , 2013 .

[68] Tian Tian,et al. Max-Margin Majority Voting for Learning from Crowds , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69] Leib Litman,et al. The relationship between motivation, monetary compensation, and data quality among US- and India-based workers on Mechanical Turk , 2014, Behavior Research Methods.

[70] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[71] Chen Xu,et al. The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[72] Jeffrey Heer,et al. Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[73] Walter S. Lasecki,et al. Real-time captioning by groups of non-experts , 2012, UIST.

[74] Gabriella Kazai,et al. In Search of Quality in Crowdsourcing for Search Engine Evaluation , 2011, ECIR.

[75] Daniel Gildea,et al. Text Alignment for Real-Time Crowd Captioning , 2013, NAACL.

[76] David Maxwell Chickering,et al. Machine Teaching: A New Paradigm for Building Machine Learning Systems , 2017, ArXiv.

[77] Tim Roughgarden,et al. Mathematical foundations for social computing , 2016, Commun. ACM.

[78] Yashesh Gaur,et al. Manipulating Word Lattices to Incorporate Human Corrections , 2016, INTERSPEECH.

[79] Lyle Ungar,et al. Distilling the Wisdom of Crowds: Prediction Markets vs. Prediction Polls , 2017, Manag. Sci..

[80] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[81] Arpit Agarwal,et al. Informed Truthfulness in Multi-Task Peer Prediction , 2016, EC.

[82] Eric Horvitz,et al. On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems , 2016, AAAI.

[83] A. P. Dawid,et al. Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[84] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[85] Sanja Fidler,et al. Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[86] Linda G. Pierce,et al. The Perceived Utility of Human and Automated Aids in a Visual Detection Task , 2002, Hum. Factors.

[87] Milan Vojnovic,et al. Crowdsourcing and all-pay auctions , 2009, EC '09.

[88] Dana Chandler,et al. Breaking Monotony with Meaning: Motivation in Crowdsourcing Markets , 2012, ArXiv.

[89] Thomas W. Malone,et al. Heuristics for designing enjoyable user interfaces: Lessons from computer games , 1982, CHI '82.

[90] Aniket Kittur,et al. An Assessment of Intrinsic and Extrinsic Motivation on Task Performance in Crowdsourcing Markets , 2011, ICWSM.

[91] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[92] M. Six Silberman,et al. Ethics and tactics of professional crowdwork , 2010, XRDS.

[93] Heng Ji,et al. FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation , 2015, KDD.

[94] Kristen Grauman,et al. What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations , 2009, CVPR.

[95] Mausam,et al. Crowdsourcing Multi-Label Classification for Taxonomy Creation , 2013, HCOMP.

[96] James A. Landay,et al. Gestalt: integrated support for implementation and analysis in machine learning , 2010, UIST.

[97] Qiang Liu,et al. Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy , 2014, ICML.

[98] John C. Platt,et al. Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[99] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[100] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[101] Serge J. Belongie,et al. Cost-Effective HITs for Relative Similarity Comparisons , 2014, HCOMP.

[102] Chris Callison-Burch,et al. Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[103] Philip Tetlock,et al. The psychology of intelligence analysis: drivers of prediction accuracy in world politics. , 2015, Journal of experimental psychology. Applied.

[104] Beng Chin Ooi,et al. iCrowd: An Adaptive Crowdsourcing Framework , 2015, SIGMOD Conference.

[105] Javier R. Movellan,et al. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[106] Chris Callison-Burch,et al. Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[107] N. Sadat Shami,et al. Experiments on Motivational Feedback for Crowdsourced Workers , 2013, ICWSM.

[108] Hyun-Chul Kim,et al. Bayesian Classifier Combination , 2012, AISTATS.

[109] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[110] AbernethyJacob,et al. Efficient Market Making via Convex Optimization, and a Connection to Online Learning , 2013 .

[111] James Hays,et al. SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[112] Boi Faltings,et al. A Robust Bayesian Truth Serum for Non-Binary Signals , 2013, AAAI.

[113] Daniel G. Goldstein,et al. Manipulating and Measuring Model Interpretability , 2018, CHI.

[114] Tim Kraska,et al. CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[115] Walter S. Lasecki,et al. Warping time for more effective real-time crowdsourcing , 2013, CHI.

[116] Siddharth Suri,et al. Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[117] Omar Alonso,et al. Crowdsourcing for relevance evaluation , 2008, SIGF.

[118] Michael J. Paul. Interpretable Machine Learning : Lessons from Topic Modeling , 2016 .

[119] Jaime G. Carbonell,et al. Collaborative workflow for crowdsourcing translation , 2012, CSCW.

[120] Ashish Kapoor,et al. FeatureInsight: Visual support for error-driven feature ideation in text classification , 2015, 2015 IEEE Conference on Visual Analytics Science and Technology (VAST).

[121] Jesse Chandler,et al. Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers , 2013, Behavior Research Methods.

[122] Arpit Agarwal,et al. Peer Prediction with Heterogeneous Users , 2017, EC.

[123] Devi Parikh. Human-Debugging of Machines , 2011 .

[124] Duncan J. Watts,et al. Cooperation and Contagion in Web-Based, Networked Public Goods Experiments , 2010, SECO.

[125] Philip Tetlock,et al. Identifying and Cultivating Superforecasters as a Method of Improving Probabilistic Predictions , 2015, Perspectives on psychological science : a journal of the Association for Psychological Science.

[126] M. Six Silberman,et al. Turkopticon: interrupting worker invisibility in amazon mechanical turk , 2013, CHI.

[127] Chong Wang,et al. Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[128] Duncan J. Watts,et al. Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[129] Xi Chen,et al. Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[130] Edwin V. Bonilla,et al. Improving Topic Coherence with Regularized Topic Models , 2011, NIPS.

[131] Walter S. Lasecki,et al. A readability evaluation of real-time crowd captions in the classroom , 2012, ASSETS '12.

[132] Li Fei-Fei,et al. Crowdsourcing in Computer Vision , 2016, Found. Trends Comput. Graph. Vis..

[133] R. Preston McAfee,et al. The Economic and Cognitive Costs of Annoying Display Advertisements , 2014 .

[134] D. Goldstein,et al. Simple Rules for Complex Decisions , 2017, 1702.04690.

[135] Milad Shokouhi,et al. Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[136] Jennifer Marie Logg,et al. Organizational Behavior and Human Decision Processes , 2019 .

[137] Fei-Fei Li,et al. Best of both worlds: Human-machine collaboration for object annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[138] Adam Tauman Kalai,et al. Adaptively Learning the Crowd Kernel , 2011, ICML.

[139] Yu-An Sun,et al. Monetary Interventions in Crowdsourcing Task Switching , 2014, HCOMP.

[140] John Langford,et al. Telling humans and computers apart automatically , 2004, CACM.

[141] Cynthia Rudin,et al. Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[142] Joel Huber,et al. Character Misrepresentation by Amazon Turk Workers : Assessment and Solutions CONTRIBUTION STATEMENT Consumer researchers conducting studies with Amazon Mechanical Turk Workers , 2018 .

[143] Jennifer Wortman Vaughan,et al. A new understanding of prediction markets via no-regret learning , 2010, EC '10.

[144] Alexandra Chouldechova,et al. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[145] Devavrat Shah,et al. Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[146] Jaime Teevan,et al. Supporting Collaborative Writing with Microtasks , 2016, CHI.

[147] D. Prelec. A Bayesian Truth Serum for Subjective Data , 2004, Science.

[148] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[149] Blase Ur,et al. Do Users' Perceptions of Password Security Match Reality? , 2016, CHI.

[150] Johannes Gehrke,et al. Intelligible models for classification and regression , 2012, KDD.

[151] Matt Post,et al. The Language Demographics of Amazon Mechanical Turk , 2014, TACL.

[152] Devavrat Shah,et al. Reducing Crowdsourcing to Graphon Estimation, Statistically , 2017, AISTATS.

[153] David M. Mimno,et al. Applications of Topic Models , 2017, Found. Trends Inf. Retr..

[154] Chien-Ju Ho,et al. Adaptive Task Assignment for Crowdsourced Classification , 2013, ICML.

[155] Qiang Liu,et al. Scoring Workers in Crowdsourcing: How Many Control Questions are Enough? , 2013, NIPS.

[156] Walter S. Lasecki,et al. Online quality control for real-time crowd captioning , 2012, ASSETS '12.

[157] Anirban Dasgupta,et al. Crowdsourced judgement elicitation with endogenous proficiency , 2013, WWW.

[158] Ming Yin,et al. The Communication Network Within the Crowd , 2016, WWW.

[159] John Langford,et al. CAPTCHA: Using Hard AI Problems for Security , 2003, EUROCRYPT.

[160] Kevin Leyton-Brown,et al. Mechanical TA: Partially Automated High-Stakes Peer Grading , 2015, SIGCSE.

[161] Jennifer Wortman Vaughan,et al. Efficient Market Making via Convex Optimization, and a Connection to Online Learning , 2013, TEAC.

[162] Michael S. Bernstein,et al. Soylent: a word processor with a crowd inside , 2010, UIST.

[163] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[164] Michael S. Bernstein,et al. Scaling short-answer grading by combining peer assessment with algorithmic scoring , 2014, L@S.

[165] Murat Demirbas,et al. Crowdsourcing for Multiple-Choice Question Answering , 2014, AAAI.

[166] Michael S. Bernstein,et al. The future of crowd work , 2013, CSCW.

[167] Alessandro Acquisti,et al. Beyond the Turk: An Empirical Comparison of Alternative Platforms for Crowdsourcing Online Behavioral Research , 2016 .

[168] Beng Chin Ooi,et al. CDAS: A Crowdsourcing Data Analytics System , 2012, Proc. VLDB Endow..

[169] Eric Horvitz,et al. Incentives for truthful reporting in crowdsourcing , 2012, AAMAS.

[170] Chien-Ju Ho,et al. Online Task Assignment in Crowdsourcing Markets , 2012, AAAI.

[171] Ittai Abraham,et al. How Many Workers to Ask?: Adaptive Exploration for Collecting High Quality Labels , 2014, SIGIR.

[172] David G. Rand,et al. The online laboratory: conducting experiments in a real labor market , 2010, ArXiv.

[173] Ben R. Newell,et al. The average laboratory samples a population of 7,300 Amazon Mechanical Turk workers , 2015, Judgment and Decision Making.

[174] Daniel Gildea,et al. Scribe: deep integration of human and machine intelligence to caption speech in real time , 2017, Commun. ACM.

[175] Jon Kleinberg,et al. Making sense of recommendations , 2019, Journal of Behavioral Decision Making.

[176] Jinfeng Yi,et al. Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning , 2012, NIPS.

[177] Bo Zhao,et al. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[178] Ashish Khetan,et al. Achieving budget-optimality with adaptive schemes in crowdsourcing , 2016, NIPS.

[179] R. Preston McAfee,et al. Who moderates the moderators?: crowdsourcing abuse detection in user-generated content , 2011, EC '11.

[180] Avi Feller,et al. Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[181] Babak Hassibi,et al. Graph Clustering With Missing Data: Convex Algorithms and Analysis , 2014, NIPS.

[182] Gerardo Hermosillo,et al. Learning From Crowds , 2010, J. Mach. Learn. Res..

[183] Lukas Biewald,et al. Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.

[184] Rashmi R. Sinha,et al. Comparing Recommendations Made by Online Systems and Friends , 2001, DELOS.

[185] Thore Graepel,et al. Quality Expectation-Variance Tradeoffs in Crowdsourcing Contests , 2012, AAAI.

[186] Jacki O'Neill,et al. Being a turker , 2014, CSCW.

[187] Jaime Teevan,et al. Communicating Context to the Crowd for Complex Writing Tasks , 2017, CSCW.

[188] Jacki O'Neill,et al. Turk-Life in India , 2014, GROUP.

[189] Mehran Sahami,et al. Text Mining: Classification, Clustering, and Applications , 2009 .

[190] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.

[191] Lydia B. Chilton,et al. Cascade: crowdsourcing taxonomy creation , 2013, CHI.

[192] Michael J. Paul,et al. Discovering Health Topics in Social Media Using Topic Models , 2014, PloS one.

[193] David A. Forsyth,et al. Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[194] Krzysztof Z. Gajos,et al. Human computation tasks with global constraints , 2012, CHI.

[195] Jennifer Widom,et al. Understanding Workers, Developing Effective Tasks, and Enhancing Marketplace Dynamics: A Study of a Large Crowdsourcing Marketplace , 2017, Proc. VLDB Endow..

[196] Jonathan Baron,et al. Two Reasons to Make Aggregated Probability Forecasts More Extreme , 2014, Decis. Anal..

[197] Michael S. Bernstein,et al. Mechanical Novel: Crowdsourcing Complex Work through Reflection and Revision , 2016, CSCW.

[198] Krzysztof Z. Gajos,et al. Curiosity Killed the Cat, but Makes Crowdwork Better , 2016, CHI.

[199] Guoliang Li,et al. Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[200] Bo Zhao,et al. A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[201] Ryan P. Adams,et al. Trick or treat: putting peer prediction to the test , 2014 .

[202] Robert C. Edgar,et al. Multiple sequence alignment. , 2006, Current opinion in structural biology.

[203] Lionel P. Robert,et al. When Does More Money Work? Examining the Role of Perceived Fairness in Pay on the Performance Quality of Crowdworkers , 2017, ICWSM.

[204] Matthew Lease,et al. Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms , 2015 .

[205] Elena Paslaru Bontas Simperl,et al. Improving Paid Microtasks through Gamification and Adaptive Furtherance Incentives , 2015, WWW.

[206] Boi Faltings,et al. Mechanisms for Making Crowds Truthful , 2014, J. Artif. Intell. Res..

[207] Babak Hassibi,et al. Crowdsourced Clustering: Querying Edges vs Triangles , 2016, NIPS.

[208] Ece Kamar,et al. Directions in Hybrid Intelligence: Complementing AI Systems with Human Intelligence , 2016, IJCAI.

[209] David C. Parkes,et al. A Robust Bayesian Truth Serum for Small Populations , 2012, AAAI.

[210] Manuel Blum,et al. reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[211] Aaron D. Shaw,et al. Designing incentives for inexpert human raters , 2011, CSCW.

[212] Carlos Guestrin,et al. Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance , 2016, ArXiv.

[213] Michael S. Bernstein,et al. We Are Dynamo: Overcoming Stalling and Friction in Collective Action for Crowd Workers , 2015, CHI.

[214] Christopher T. Lowenkamp,et al. False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[215] Michael D. Buhrmester,et al. Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[216] Panagiotis G. Ipeirotis,et al. Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[217] Sanjeev Khanna,et al. Top-k and Clustering with Noisy Comparisons , 2014, ACM Trans. Database Syst..

[218] Juho Hamari,et al. Does Gamification Work? -- A Literature Review of Empirical Studies on Gamification , 2014, 2014 47th Hawaii International Conference on System Sciences.

[219] J. Baron,et al. Do patients trust computers , 2006 .

[220] Devavrat Shah,et al. Iterative Learning for Reliable Crowdsourcing Systems , 2011, NIPS.

[221] Kannan Ramchandran,et al. Truth Serums for Massively Crowdsourced Evaluation Tasks , 2015, ArXiv.

[222] Yiling Chen,et al. Output Agreement Mechanisms and Common Knowledge , 2014, HCOMP.

[223] Yu-An Sun,et al. The Effects of Performance-Contingent Financial Incentives in Online Labor Markets , 2013, AAAI.

[224] Johannes Gehrke,et al. Accurate intelligible models with pairwise interactions , 2013, KDD.

[225] Nihar B. Shah,et al. Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing , 2014, J. Mach. Learn. Res..