Managing Bias in Human-Annotated Data: Moving Beyond Bias Removal

Due to the widespread use of data-powered systems in our everyday lives, the notions of bias and fairness gained significant attention among researchers and practitioners, in both industry and academia. Such issues typically emerge from the data, which comes with varying levels of quality, used to train systems. With the commercialization and employment of such systems that are sometimes delegated to make life-changing decisions, a significant effort is beingmade towards the identification and removal of possible sources of bias that may surface to the final end-user. In this position paper, we instead argue that bias is not something that should necessarily be removed in all cases, and the attention and effort should shift from bias removal to the identification, measurement, indexing, surfacing, and adjustment of bias, which we name bias management. We argue that if correctly managed, bias can be a resource that can be made transparent to the the users and empower them to make informed choices about their experience with the system.

[1]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[2]  Adam Wierman,et al.  Thinking Fast and Slow , 2017, SIGMETRICS Perform. Evaluation Rev..

[3]  Huaxiang Zhang,et al.  A Region Selection Model to Identify Unknown Unknowns in Image Datasets , 2020, ECAI.

[4]  Anna L. Cox,et al.  Monotasking or Multitasking: Designing for Crowdworkers' Preferences , 2019, CHI.

[5]  Carsten Eickhoff,et al.  Cognitive Biases in Crowdsourcing , 2018, WSDM.

[6]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings , 2014, Proc. Priv. Enhancing Technol..

[7]  Pat Croskerry,et al.  Cognitive forcing strategies in clinical decisionmaking. , 2003, Annals of emergency medicine.

[8]  Carey K. Morewedge,et al.  Missing: A Serious Game for the Mitigation of Cognitive Biases , 2014 .

[9]  Jeffrey D. Ullman,et al.  The Battle for Data Science , 2020, IEEE Data Eng. Bull..

[10]  I. Rahwan,et al.  Modularity and composite diversity affect the collective gathering of information online , 2021, Nature Communications.

[11]  S. Satya‐Murti,et al.  Diagnosing Crime and Diagnosing Disease: Bias Reduction Strategies in the Forensic and Clinical Sciences , 2017, Journal of forensic sciences.

[12]  Lei Han,et al.  All Those Wasted Hours: On Task Abandonment in Crowdsourcing , 2019, WSDM.

[13]  Barbara Caputo,et al.  A Deeper Look at Dataset Bias , 2015, Domain Adaptation in Computer Vision Applications.

[14]  Samuel R. Sommers,et al.  Mere Membership in Racially Diverse Groups Reduces Conformity , 2018 .

[15]  Hemant Purohit,et al.  Modeling Human Annotation Errors to Design Bias-Aware Systems for Social Stream Processing , 2019, 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[16]  Boi Faltings,et al.  Incentives to Counter Bias in Human Computation , 2014, HCOMP.

[17]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[18]  Krishna P. Gummadi,et al.  Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations , 2017, ICWSM.

[19]  Edward Lank,et al.  The Perpetual Work Life of Crowdworkers , 2019, Proc. ACM Hum. Comput. Interact..

[20]  Aaron D. Shaw,et al.  Social desirability bias and self-reports of motivation: a study of amazon mechanical turk in the US and India , 2012, CHI.

[21]  Björn Hartmann,et al.  What's the Right Price? Pricing Tasks for Finishing on Time , 2011, Human Computation.

[22]  Eddy Maddalena,et al.  The Impact of Task Abandonment in Crowdsourcing , 2019, IEEE Transactions on Knowledge and Data Engineering.

[23]  Chien-Ju Ho,et al.  Adaptive Task Assignment for Crowdsourced Classification , 2013, ICML.

[24]  R. M. Vazquez The Checklist Manifesto How to Get Things Right , 2011 .

[25]  Elizabeth Gerber,et al.  Priming for Better Performance in Microtask Crowdsourcing Environments , 2012, IEEE Internet Computing.

[26]  S. Brodsky,et al.  Forensic psychologists' perceptions of bias and potential correction strategies in forensic mental health evaluations , 2016 .

[27]  Bryan Gibson,et al.  Mindfulness Meditation Reduces Implicit Age and Race Bias , 2014 .

[28]  Gregory E. Truman,et al.  Technical opinionMultitasking with laptops during meetings , 2009, CACM.

[29]  S. Mamede,et al.  Cognitive debiasing 1: origins of bias and theory of debiasing , 2013, BMJ quality & safety.

[30]  Lydia B. Chilton,et al.  MicroTalk: Using Argumentation to Improve Crowdsourcing Accuracy , 2016, HCOMP.

[31]  Xiang Li,et al.  An End-to-End Deep RL Framework for Task Arrangement in Crowdsourcing Platforms , 2019, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[32]  Wei Tang,et al.  Leveraging Peer Communication to Enhance Crowdsourcing , 2019, WWW.

[33]  Michael Veale,et al.  Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making , 2018, CHI.

[34]  Kwong-Sak Leung,et al.  TaskRec: A Task Recommendation Framework in Crowdsourcing Systems , 2015, Neural Processing Letters.

[35]  Abbe Mowshowitz,et al.  Bias on the web , 2002, CACM.

[36]  Allison Woodruff,et al.  A Qualitative Exploration of Perceptions of Algorithmic Fairness , 2018, CHI.

[37]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[38]  Helen Nissenbaum,et al.  Bias in computer systems , 1996, TOIS.

[39]  Timothy Baldwin,et al.  Can machine translation systems be evaluated by the crowd alone , 2015, Natural Language Engineering.

[40]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[41]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[42]  Fabio Casati,et al.  On the State of Reporting in Crowdsourcing Experiments and a Checklist to Aid Current Practices , 2021, Proc. ACM Hum. Comput. Interact..

[43]  Solon Barocas,et al.  Problem Formulation and Fairness , 2019, FAT.

[44]  Besnik Fetahu,et al.  Detecting Biased Statements in Wikipedia , 2018, WWW.

[45]  K. Jellinger,et al.  In two minds: Dual processes and beyond. , 2009 .

[46]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[47]  Arie W. Kruglanski,et al.  Bias and error in human judgment , 1983 .

[48]  Carlos Busso,et al.  Increasing the Reliability of Crowdsourcing Evaluations Using Online Quality Assessment , 2016, IEEE Transactions on Affective Computing.

[49]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[50]  Timothy D. Wilson,et al.  Mental contamination and mental correction: unwanted influences on judgments and evaluations. , 1994, Psychological bulletin.

[51]  D. Kahneman,et al.  Before you make that big decision... , 2011, Harvard business review.

[52]  Christoph Lofi,et al.  Design Patterns for Hybrid Algorithmic-Crowdsourcing Workflows , 2014, 2014 IEEE 16th Conference on Business Informatics.

[53]  Seth Neel,et al.  An Empirical Study of Rich Subgroup Fairness for Machine Learning , 2018, FAT.

[54]  Ming Yin,et al.  Accounting for Confirmation Bias in Crowdsourced Label Aggregation , 2021, IJCAI.

[55]  Akane Sano,et al.  Neurotics Can't Focus: An in situ Study of Online Multitasking in the Workplace , 2016, CHI.

[56]  Rayid Ghani,et al.  Dealing with Bias and Fairness in Data Science Systems: A Practical Hands-on Tutorial , 2020, KDD.

[57]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[58]  Gloria Mark,et al.  The cost of interrupted work: more speed and stress , 2008, CHI.

[59]  Ujwal Gadiraju,et al.  What Can Crowd Computing Do for the Next Generation of AI Systems? , 2020, CSW@NeurIPS.

[60]  Paul Johns,et al.  Focused, Aroused, but so Distractible: Temporal Perspectives on Multitasking and Communications , 2015, CSCW.

[61]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[62]  Christopher Jung,et al.  Eliciting and Enforcing Subjective Individual Fairness , 2019, ArXiv.

[63]  Stefan Dietze,et al.  Using Worker Self-Assessments for Competence-Based Pre-Selection in Crowdsourcing Microtasks , 2017, ACM Trans. Comput. Hum. Interact..

[64]  Sinan Kalkan,et al.  Investigating Bias and Fairness in Facial Expression Recognition , 2020, ECCV Workshops.

[65]  Gianluca Demartini,et al.  An Introduction to Hybrid Human-Machine Information Systems , 2017, Found. Trends Web Sci..

[66]  Matthew Lease,et al.  Crowdsourcing for information retrieval , 2012, SIGF.

[67]  David Danks,et al.  Algorithmic Bias in Autonomous Systems , 2017, IJCAI.

[68]  Besnik Fetahu,et al.  Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments , 2019, CHI.

[69]  A. Vogel,et al.  Interventions to Mitigate Bias in Social Work Decision-Making: A Systematic Review , 2019, Research on Social Work Practice.

[70]  Yonghe Zhang,et al.  An improved mix framework for opinion leader identification in online learning communities , 2013, Knowl. Based Syst..

[71]  J. Kruger,et al.  Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments. , 1999, Journal of personality and social psychology.

[72]  Valerie L. Bartelt,et al.  Ethnic diversity deflates price bubbles , 2014, Proceedings of the National Academy of Sciences.

[73]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[74]  E. Youngstrom,et al.  A randomized controlled trial of cognitive debiasing improves assessment and treatment selection for pediatric bipolar disorder. , 2016, Journal of consulting and clinical psychology.

[75]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[76]  K. Stanovich,et al.  On the relative independence of thinking biases and cognitive ability. , 2008, Journal of personality and social psychology.

[77]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[78]  Matthew D. Lieberman,et al.  Reflexion and reflection: A social cognitive neuroscience approach to attributional inference , 2002 .

[79]  Jeffrey P. Bigham,et al.  Striving to Earn More: A Survey of Work Strategies and Tool Use Among Crowd Workers , 2018, HCOMP.

[80]  Mourad Khayati,et al.  OpenCrowd: A Human-AI Collaborative Approach for Finding Social Influencers via Open-Ended Answers Aggregation , 2020, WWW.

[81]  Lu Hong,et al.  Groups of diverse problem solvers can outperform groups of high-ability problem solvers. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[82]  Muhammad Imran,et al.  Engineering Crowdsourced Stream Processing Systems , 2013, ArXiv.

[83]  Luca Oneto,et al.  Fairness in Machine Learning , 2020, INNSBDDL.

[84]  Philip Resnik,et al.  Political Ideology Detection Using Recursive Neural Networks , 2014, ACL.

[85]  Carey K. Morewedge,et al.  Debiasing Training Improves Decision Making in the Field , 2019, Psychological science.

[86]  Jin Ha Lee,et al.  Crowdsourcing Music Similarity Judgments using Mechanical Turk , 2010, ISMIR.

[87]  Praveen K. Paritosh,et al.  “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI , 2021, CHI.

[88]  Besnik Fetahu,et al.  LimitBias! Measuring Worker Biases in the Crowdsourced Collection of Subjective Judgments (short paper) , 2018, SAD/CrowdBias@HCOMP.

[89]  Matthew Lease,et al.  An Information Retrieval Approach to Building Datasets for Hate Speech Detection , 2021, NeurIPS Datasets and Benchmarks.

[90]  AlonsoOmar,et al.  Using crowdsourcing for TREC relevance assessment , 2012 .

[91]  Gianluca Demartini,et al.  Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities , 2020, IEEE Data Eng. Bull..

[92]  Ece Kamar,et al.  Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets , 2017, CHI.

[93]  David Suendermann,et al.  Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment , 2013 .

[94]  Michael S. Bernstein,et al.  Embracing Error to Enable Rapid Crowdsourcing , 2016, CHI.

[95]  Sigal G. Barsade,et al.  Debiasing the Mind through Meditation: Mindfulness and the Sunk Cost Bias , 2013 .

[96]  Jaime Teevan,et al.  Supporting Workplace Detachment and Reattachment with Conversational Intelligence , 2018, CHI.

[97]  Edward Curry,et al.  A Multi-armed Bandit Approach to Online Spatial Task Assignment , 2014, 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops.

[98]  Xiaoni Duan,et al.  Does Exposure to Diverse Perspectives Mitigate Biases in Crowdwork? An Explorative Study , 2020, HCOMP.

[99]  Gianluca Demartini,et al.  Investigating User Perception of Gender Bias in Image Search: The Role of Sexism , 2018, SIGIR.

[100]  Kalpana Parshotam,et al.  Crowd computing: a literature review and definition , 2013, SAICSIT '13.

[101]  R. Stuart Geiger,et al.  Garbage in, garbage out?: do machine learning application papers in social computing report where human-labeled training data comes from? , 2019, FAT*.

[102]  Shauna Ely Tarrac Msn Rn Cic Cnor Medical Error and Harm: Understanding, Prevention, and Control , 2011 .

[103]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[104]  M. Hilbert,et al.  Toward a synthesis of cognitive biases: how noisy information processing can bias human decision making. , 2012, Psychological bulletin.

[105]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[106]  Sihang Qiu,et al.  Improving Reactions to Rejection in Crowdsourcing Through Self-Reflection , 2021, WebSci.

[107]  Hermann Ebbinghaus (1885) Memory: A Contribution to Experimental Psychology , 2013, Annals of Neurosciences.

[108]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[109]  Danah Boyd,et al.  Fairness and Abstraction in Sociotechnical Systems , 2019, FAT.

[110]  David M. Frohlich,et al.  Timespace in the workplace: dealing with interruptions , 1995, CHI 95 Conference Companion.