Accounting for Confirmation Bias in Crowdsourced Label Aggregation

Collecting large-scale human-annotated datasets via crowdsourcing to train and improve automated models is a prominent human-in-the-loop approach to integrate human and machine intelligence. However, together with their unique intelligence, humans also come with their biases and subjective beliefs, which may influence the quality of the annotated data and negatively impact the effectiveness of the human-in-the-loop systems. One of the most common types of cognitive biases that humans are subject to is the confirmation bias, which is people’s tendency to favor information that confirms their existing beliefs and values. In this paper, we present an algorithmic approach to infer the correct answers of tasks by aggregating the annotations from multiple crowd workers, while taking workers’ various levels of confirmation bias into consideration. Evaluations on real-world crowd annotations show that the proposed bias-aware label aggregation algorithm outperforms baseline methods in accurately inferring the ground-truth labels of different tasks when crowd workers indeed exhibit some degree of confirmation bias. Through simulations on synthetic data, we further identify the conditions when the proposed algorithm has the largest advantages over baseline methods.

[1]  Anna L. Cox,et al.  Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems , 2019, CHI.

[2]  Luca Cardelli,et al.  The World Wide Web Conference , 2019, WWW.

[3]  Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining , 2018, WSDM.

[4]  Haipei Sun,et al.  Towards Fair Truth Discovery from Biased Crowdsourced Answers , 2020, KDD.

[5]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[6]  Jahna Otterbacher,et al.  How Do We Talk about Other People? Group (Un)Fairness in Natural Language Image Descriptions , 2019, HCOMP.

[7]  Stefano Mizzaro,et al.  Crowdsourcing Truthfulness: The Impact of Judgment Scale and Assessor Bias , 2020, ECIR.

[8]  Besnik Fetahu,et al.  Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments , 2019, CHI.

[9]  Proceedings of The Web Conference 2020 , 2020 .

[10]  Arpita Biswas,et al.  The Role of In-Group Bias and Balanced Data: A Comparison of Human and Machine Recidivism Risk Predictions , 2020, COMPASS.

[11]  Chengqi Zhang,et al.  Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2015, KDD.

[12]  Aldo Gangemi,et al.  Proceedings of the 24th International Conference on World Wide Web , 2015, WWW.

[13]  Reynold Cheng,et al.  QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications , 2015, SIGMOD Conference.

[14]  Derek Ruths,et al.  How One Microtask Affects Another , 2016, CHI.

[15]  Matthew Lease,et al.  Modeling and Aggregation of Complex Annotations via Annotation Distances , 2020, WWW.

[16]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[17]  Wei Tang,et al.  Leveraging Peer Communication to Enhance Crowdsourcing , 2019, WWW.

[18]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[19]  Andrei Broder,et al.  Proceedings of the 23rd International Conference on World Wide Web , 2014, WWW 2014.

[20]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[21]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[22]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[23]  Carsten Eickhoff,et al.  Cognitive Biases in Crowdsourcing , 2018, WSDM.

[24]  Allison Druin,et al.  Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems , 2016, CHI.

[25]  Paolo Santi,et al.  Supplementary information of paper "Xu, Y., Belyi, A., Santi, P. and Ratti, C. Quantifying segregation in an integrated urban physical-social space. Journal of the Royal Society Interface" , 2019 .

[26]  Xiaoni Duan,et al.  Does Exposure to Diverse Perspectives Mitigate Biases in Crowdwork? An Explorative Study , 2020, HCOMP.

[27]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[28]  Jianrui Huang,et al.  Sequential biases on subjective judgments: Evidence from face attractiveness and ringtone agreeableness judgment , 2018, PloS one.

[29]  Elisa Bertino,et al.  Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.

[30]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[31]  Aleksandrs Slivkins,et al.  Incentivizing high quality crowdwork , 2015, SECO.