Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning

The increased use of machine learning in recent years led to large volumes of data being manually labeled via crowdsourcing microtasks completed by humans. This brought about dehumanization effects, namely, when task requesters overlook the humans behind the task, leading to issues of ethics (e.g., unfair payment) and amplification of human biases, which are transferred into training data and affect machine learning in the real world. We propose a framework that allocates microtasks considering human factors of workers such as demographics and compensation. We deployed our framework to a popular crowdsourcing platform and conducted experiments with 1,919 workers collecting 160,345 human judgments. By routing microtasks to workers based on demographics and appropriate pay, our framework mitigates biases in the contributor sample and increases the hourly pay given to contributors. We discuss potential extensions and how it can promote transparency in crowdsourcing.

[1]  Thomas Hofmann,et al.  Active Content-Based Crowdsourcing Task Selection , 2016, CIKM.

[2]  Gabriella Kazai,et al.  Worker types and personality traits in crowdsourcing relevance labels , 2011, CIKM '11.

[3]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[4]  Gianluca Demartini,et al.  Investigating User Perception of Gender Bias in Image Search: The Role of Sexism , 2018, SIGIR.

[5]  John Zimmerman,et al.  Scopist: Building a Skill Ladder into Crowd Transcription , 2017, W4A.

[6]  Emre Kıcıman,et al.  Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries , 2018, Front. Big Data.

[7]  Eric Gilbert,et al.  Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk , 2015, CHI.

[8]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[9]  Michael S. Bernstein,et al.  We Are Dynamo: Overcoming Stalling and Friction in Collective Action for Crowd Workers , 2015, CHI.

[10]  Lilly Irani,et al.  Amazon Mechanical Turk , 2018, Advances in Intelligent Systems and Computing.

[11]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  Michael S. Bernstein,et al.  The future of crowd work , 2013, CSCW.

[13]  Carsten Eickhoff,et al.  Cognitive Biases in Crowdsourcing , 2018, WSDM.

[14]  Justin Cheng,et al.  How annotation styles influence content and preferences , 2013, HT '13.

[15]  Aniket Kittur,et al.  An Assessment of Intrinsic and Extrinsic Motivation on Task Performance in Crowdsourcing Markets , 2011, ICWSM.

[16]  Brent J. Hecht,et al.  Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards , 2015, CSCW.

[17]  Elisa Bertino,et al.  Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.

[18]  Trevor Darrell,et al.  Women also Snowboard: Overcoming Bias in Captioning Models , 2018, ECCV.

[19]  Alexei A. Efros,et al.  Undoing the Damage of Dataset Bias , 2012, ECCV.

[20]  Dong Nguyen,et al.  Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment , 2014, COLING.

[21]  Jan Schnellenbach,et al.  Nudges and norms: On the political economy of soft paternalism , 2012 .

[22]  Jeffrey P. Bigham,et al.  Striving to Earn More: A Survey of Work Strategies and Tool Use Among Crowd Workers , 2018, HCOMP.

[23]  John Riedl,et al.  War Versus Inspirational in Forrest Gump: Cultural Effects in Tagging Communities , 2012, ICWSM.

[24]  Gianluca Demartini,et al.  Pick-a-crowd: tell me what you like, and i'll tell you what to do , 2013, CIDR.

[25]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[26]  Michael S. Bernstein,et al.  Crowd Guilds: Worker-led Reputation and Feedback on Crowdsourcing Platforms , 2016, CSCW.

[27]  Stefan Dietze,et al.  Human Beyond the Machine: Challenges and Opportunities of Microtask Crowdsourcing , 2015, IEEE Intelligent Systems.

[28]  Stephen J. Roberts,et al.  Bayesian Methods for Intelligent Task Assignment in Crowdsourcing Systems , 2015, Decision Making.

[29]  Barbara Caputo,et al.  A Deeper Look at Dataset Bias , 2015, Domain Adaptation in Computer Vision Applications.

[30]  Nicholas Diakopoulos,et al.  Accountability in algorithmic decision making , 2016, Commun. ACM.

[31]  Patrick Siehndel,et al.  Breaking Bad: Understanding Behavior of Crowd Workers in Categorization Microtasks , 2015, HT.

[32]  Aaron Halfaker,et al.  Simulation Experiments on (the Absence of) Ratings Bias in Reputation Systems , 2017, Proc. ACM Hum. Comput. Interact..

[33]  Aaron D. Shaw,et al.  Designing incentives for inexpert human raters , 2011, CSCW.

[34]  Gabriella Kazai,et al.  An analysis of human factors and label accuracy in crowdsourcing relevance judgments , 2013, Information Retrieval.

[35]  Chris Callison-Burch,et al.  A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical Turk , 2017, CHI.

[36]  M. Six Silberman,et al.  Turkopticon: interrupting worker invisibility in amazon mechanical turk , 2013, CHI.