Trustworthy Human Computation: A Survey

Human computation is an approach to solving problems that prove difficult using AI only, and involves the cooperation of many humans. Because human computation requires close engagement with both “human populations as users” and “human populations as driving forces,” establishing mutual trust between AI and humans is an important issue to further the development of human computation. This survey lays the groundwork for the realization of trustworthy human computation. First, the trustworthiness of human computation as computing systems, that is, trust offered by humans to AI, is examined using the RAS (Reliability, Availability, and Serviceability) analogy, which define measures of trustworthiness in conventional computer systems. Next, the social trustworthiness provided by human computation systems to users or participants is discussed from the perspective of AI ethics, including fairness, privacy, and transparency. Then, we consider human–AI collaboration based on two-way trust, in which humans and AI build mutual trust and accomplish difficult tasks through reciprocal collaboration. Finally, future challenges and research directions for realizing trustworthy human computation are discussed.

[1]  E. Hernández-Pereira,et al.  Human-in-the-loop machine learning: a state of the art , 2022, Artificial Intelligence Review.

[2]  Le Cong Dinh,et al.  Efficient and adaptive incentive selection for crowdsourcing contests , 2022, Applied Intelligence.

[3]  Yunjun Gao,et al.  On Dynamically Pricing Crowdsourcing Tasks , 2022, ACM Transactions on Knowledge Discovery from Data.

[4]  G. Satzger,et al.  Designing Transparency for Effective Human-AI Collaboration , 2022, Information Systems Frontiers.

[5]  Julian McAuley,et al.  AI-Moderated Decision-Making: Capturing and Balancing Anchoring Bias in Sequential Decision Tasks , 2022, CHI.

[6]  Michael S. Bernstein,et al.  Jury Learning: Integrating Dissenting Voices into Machine Learning Models , 2022, CHI.

[7]  Kaley J. Rittichier,et al.  Trustworthy Artificial Intelligence: A Review , 2022, ACM Comput. Surv..

[8]  Vinodkumar Prabhakaran,et al.  Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations , 2021, TACL.

[9]  Jiliang Tang,et al.  Toward Annotator Group Bias in Crowdsourcing , 2021, ACL.

[10]  Depeng Dang,et al.  A Generative Answer Aggregation Model for Sentence-Level Crowdsourcing Tasks , 2023, IEEE Transactions on Knowledge and Data Engineering.

[11]  Q. Vera Liao,et al.  Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies , 2021, ArXiv.

[12]  Julie McDonough Dolmaya The ethics of crowdsourcing , 2021 .

[13]  Gilles Bailly,et al.  How to Evaluate Trust in AI-Assisted Decision Making? A Survey of Empirical Methodologies , 2021, Proc. ACM Hum. Comput. Interact..

[14]  U. Gadiraju,et al.  A Checklist to Combat Cognitive Biases in Crowdsourcing , 2021, HCOMP.

[15]  Beau G. Schelble,et al.  Modeling and Guiding the Creation of Ethical Human-AI Teams , 2021, AIES.

[16]  Hisashi Kashima,et al.  Crowdsourcing Evaluation of Saliency-based XAI Methods , 2021, ECML/PKDD.

[17]  Michael S. Bernstein,et al.  The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality , 2021, CHI.

[18]  Ming Yin,et al.  Are Explanations Helpful? A Comparative Study of the Effects of Explanations in AI-Assisted Decision-Making , 2021, IUI.

[19]  Hua Wang,et al.  Privacy-Preserving Task Recommendation Services for Crowdsourcing , 2021, IEEE Transactions on Services Computing.

[20]  Raymond Fok,et al.  Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance , 2020, CHI.

[21]  Murat Kantarcioglu,et al.  Does Explainable Artificial Intelligence Improve Human Decision-Making? , 2020, AAAI.

[22]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[23]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[24]  Ming Yin,et al.  Accounting for Confirmation Bias in Crowdsourced Label Aggregation , 2021, IJCAI.

[25]  Xiaoni Duan,et al.  Does Exposure to Diverse Perspectives Mitigate Biases in Crowdwork? An Explorative Study , 2020, HCOMP.

[26]  Eric D. Ragan,et al.  Soliciting Human-in-the-Loop User Feedback for Interactive Machine Learning Reduces User Trust and Impressions of Model Accuracy , 2020, HCOMP.

[27]  Antske Fokkens,et al.  A Research Agenda for Hybrid Intelligence: Augmenting Human Intellect With Collaborative, Adaptive, Responsible, and Explainable Artificial Intelligence , 2020, Computer.

[28]  L. Rossi,et al.  Distortions of political bias in crowdsourced misinformation flagging , 2020, Journal of the Royal Society Interface.

[29]  Eric Horvitz,et al.  Learning to Complement Humans , 2020, IJCAI.

[30]  Daniel S. Weld,et al.  No Explainability without Accountability: An Empirical Study of Explanations and Feedback in Interactive ML , 2020, CHI.

[31]  Stefano Mizzaro,et al.  Crowdsourcing Truthfulness: The Impact of Judgment Scale and Assessor Bias , 2020, ECIR.

[32]  Robert O. Briggs,et al.  Machines as teammates: A research agenda on AI in team collaboration , 2020, Inf. Manag..

[33]  Krzysztof Z. Gajos,et al.  Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems , 2020, IUI.

[34]  Yunfeng Zhang,et al.  Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making , 2020, FAT*.

[35]  Dirk Hovy,et al.  Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview , 2019, ACL.

[36]  Mani B. Srivastava,et al.  How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods , 2020, NeurIPS.

[37]  Dominik Dellermann,et al.  Hybrid Intelligence , 2019, Business & Information Systems Engineering.

[38]  Edith Law,et al.  Paying Crowd Workers for Collaborative Work , 2019, Proc. ACM Hum. Comput. Interact..

[39]  BEN GREEN,et al.  The Principles and Limits of Algorithm-in-the-Loop Decision Making , 2019, Proc. ACM Hum. Comput. Interact..

[40]  Eric Horvitz,et al.  Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance , 2019, HCOMP.

[41]  Eric D. Ragan,et al.  The Effects of Meaningful and Meaningless Explanations on Trust and Perceived System Accuracy in Intelligent Systems , 2019, HCOMP.

[42]  Eric Horvitz,et al.  Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff , 2019, AAAI.

[43]  What are the OECD Principles on AI? , 2019, OECD Observer.

[44]  Robert H. Deng,et al.  CrowdBC: A Blockchain-Based Decentralized Framework for Crowdsourcing , 2019, IEEE Transactions on Parallel and Distributed Systems.

[45]  Benjamin I. P. Rubinstein,et al.  Exploiting Worker Correlation for Label Aggregation in Crowdsourcing , 2019, ICML.

[46]  Gianluca Demartini,et al.  Implicit Bias in Crowdsourced Knowledge Graphs , 2019, WWW.

[47]  Mon-Chu Chen,et al.  Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning , 2019, CHI.

[48]  Besnik Fetahu,et al.  Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments , 2019, CHI.

[49]  Xiang Lian,et al.  FROG: A Fast and Reliable Crowdsourcing Framework , 2019, IEEE Transactions on Knowledge and Data Engineering.

[50]  Dominik Dellermann,et al.  The Future of Human-AI Collaboration: A Taxonomy of Design Knowledge for Hybrid Intelligence Systems , 2019, HICSS.

[51]  Vivian Lai,et al.  On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection , 2018, FAT.

[52]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[53]  Harmanpreet Kaur,et al.  Building Shared Mental Models between Humans and AI for Effective Collaboration , 2019 .

[54]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[55]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[56]  Hengrun Zhang,et al.  A Survey on Security, Privacy, and Trust in Mobile Crowdsourcing , 2018, IEEE Internet of Things Journal.

[57]  Gianluca Demartini,et al.  Investigating User Perception of Gender Bias in Image Search: The Role of Sexism , 2018, SIGIR.

[58]  Jeffrey P. Bigham,et al.  Striving to Earn More: A Survey of Work Strategies and Tool Use Among Crowd Workers , 2018, HCOMP.

[59]  Daniel Gatica-Perez,et al.  Ambiance in Social Media Venues: Visual Cue Interpretation by Machines and Crowds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[60]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[61]  Robert H. Deng,et al.  Anonymous Privacy-Preserving Task Matching in Crowdsourcing , 2018, IEEE Internet of Things Journal.

[62]  Hisashi Kashima,et al.  AdaFlock: Adaptive Feature Discovery for Human-in-the-loop Predictive Modeling , 2018, AAAI.

[63]  Gregory M. P. O'Hare,et al.  A survey of incentive engineering for crowdsourcing , 2018, The Knowledge Engineering Review.

[64]  Carsten Eickhoff,et al.  Cognitive Biases in Crowdsourcing , 2018, WSDM.

[65]  Emily Chen,et al.  How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation , 2018, ArXiv.

[66]  Subbarao Kambhampati,et al.  Algorithms for the Greater Good ! On Mental Modeling and Acceptable Symbiosis in Human-AI Collaboration Tathagata Chakraborti , 2018 .

[67]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[68]  Francisco C. Pereira,et al.  Deep learning from crowds , 2017, AAAI.

[69]  Christoph Kelp,et al.  Trustworthy artificial intelligence , 2023, Asian Journal of Philosophy.

[70]  François Bry,et al.  Human computation , 2018, it Inf. Technol..

[71]  Yada Zhu,et al.  Social Phishing , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[72]  Stefan Dietze,et al.  Using Worker Self-Assessments for Competence-Based Pre-Selection in Crowdsourcing Microtasks , 2017, ACM Trans. Comput. Hum. Interact..

[73]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[74]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[75]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[76]  Jennifer Wortman Vaughan Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research , 2017, J. Mach. Learn. Res..

[77]  Paul Voigt,et al.  The EU General Data Protection Regulation (GDPR) , 2017 .

[78]  Gina Neff,et al.  Talking to Bots: Symbiotic Agency and the Case of Tay , 2016 .

[79]  Ece Kamar,et al.  Directions in Hybrid Intelligence: Complementing AI Systems with Human Intelligence , 2016, IJCAI.

[80]  Derek Ruths,et al.  How One Microtask Affects Another , 2016, CHI.

[81]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[82]  Ehsan Amid,et al.  Multiview Triplet Embedding: Learning Attributes in Multiple Maps , 2015, ICML.

[83]  Francis T. Durso,et al.  Individual Differences in the Calibration of Trust in Automation , 2015, Hum. Factors.

[84]  Fabio Casati,et al.  Modeling, Enacting, and Integrating Custom Crowdsourcing Processes , 2015, TWEB.

[85]  Peng Dai,et al.  And Now for Something Completely Different: Improving Crowdsourcing Workflows with Micro-Diversions , 2015, CSCW.

[86]  Brent J. Hecht,et al.  Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards , 2015, CSCW.

[87]  Michael S. Bernstein,et al.  Flock: Hybrid Crowd-Machine Learning Classifiers , 2015, CSCW.

[88]  Masooda N. Bashir,et al.  Trust in Automation: Integrating Empirical Evidence on Factors That Influence Trust , 2015, Hum. Factors.

[89]  J. Brady,et al.  The Belmont Report. Ethical principles and guidelines for the protection of human subjects of research. , 2014, The Journal of the American College of Dentists.

[90]  Makoto Yokoo,et al.  Predicting Own Action: Self-Fulfilling Prophecy Induced by Proper Scoring Rules , 2014, HCOMP.

[91]  Hisashi Kashima,et al.  Instance-Privacy Preserving Crowdsourcing , 2014, HCOMP.

[92]  Boi Faltings,et al.  Incentives to Counter Bias in Human Computation , 2014, HCOMP.

[93]  Daniel Gatica-Perez,et al.  Mining Crowdsourced First Impressions in Online Social Video , 2014, IEEE Transactions on Multimedia.

[94]  Hisashi Kashima,et al.  Preserving worker privacy in crowdsourcing , 2014, Data Mining and Knowledge Discovery.

[95]  Serge J. Belongie,et al.  Cost-Effective HITs for Relative Similarity Comparisons , 2014, HCOMP.

[96]  Bo Zhao,et al.  The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing , 2014, WWW.

[97]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[98]  Panagiotis G. Ipeirotis,et al.  Quizz: targeted crowdsourcing with a billion (potential) users , 2014, WWW.

[99]  Michael S. Bernstein,et al.  Scaling short-answer grading by combining peer assessment with algorithmic scoring , 2014, L@S.

[100]  Pietro Michelucci,et al.  Handbook of Human Computation , 2013, Springer New York.

[101]  Makoto Yokoo,et al.  Ability Grouping of Crowd Workers via Reward Discrimination , 2013, HCOMP.

[102]  Eric Horvitz,et al.  Why Stop Now? Predicting Worker Engagement in Online Crowdsourcing , 2013, HCOMP.

[103]  Hisashi Kashima,et al.  Statistical quality estimation for general crowdsourcing tasks , 2013, HCOMP.

[104]  Hisashi Kashima,et al.  Accurate Integration of Crowdsourced Labels Using Workers' Self-reported Confidence Scores , 2013, IJCAI.

[105]  Hisashi Kashima,et al.  Leveraging Crowdsourcing to Detect Improper Tasks in Crowdsourcing Marketplaces , 2013, IAAI.

[106]  Eric Horvitz,et al.  Automated Workflow Synthesis , 2013, AAAI.

[107]  Paul N. Bennett,et al.  Pairwise ranking aggregation in a crowdsourced setting , 2013, WSDM.

[108]  Abraham Bernstein,et al.  CrowdLang: A Programming Language for the Systematic Exploration of Human Computation Systems , 2012, SocInfo.

[109]  Kwong-Sak Leung,et al.  TaskRec: Probabilistic Matrix Factorization in Task Recommendation in Crowdsourcing Systems , 2012, ICONIP.

[110]  Björn Hartmann,et al.  MobileWorks: Designing for Quality in a Managed Crowdsourcing Architecture , 2012, IEEE Internet Computing.

[111]  Alexander Liu,et al.  Crowdsourcing Evaluations of Classifier Interpretability , 2012, AAAI Spring Symposium: Wisdom of the Crowd.

[112]  Jennifer Widom,et al.  Deco: A System for Declarative Crowdsourcing , 2012, Proc. VLDB Endow..

[113]  Atsuyuki Morishima,et al.  CyLog/Crowd4U: A Declarative Platform for Complex Data-centric Crowdsourcing , 2012, Proc. VLDB Endow..

[114]  Yong Yu,et al.  Sembler: Ensembling Crowd Sequential Labeling for Improved Quality , 2012, AAAI.

[115]  Tom Minka,et al.  How To Grade a Test Without Knowing the Answers - A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing , 2012, ICML.

[116]  Ramesh Govindan,et al.  Medusa: a programming framework for crowd-sensing applications , 2012, MobiSys '12.

[117]  Manu Sridharan,et al.  Predicting your own effort , 2012, AAMAS.

[118]  Jessica L. Wildman,et al.  Collaboration at work: An integrative multilevel conceptualization , 2012 .

[119]  John Riedl,et al.  War Versus Inspirational in Forrest Gump: Cultural Effects in Tagging Communities , 2012, ICWSM.

[120]  Todd Kulesza,et al.  Tell me more?: the effects of mental model soundness on personalizing an intelligent agent , 2012, CHI.

[121]  Michael S. Bernstein,et al.  Analytic Methods for Optimizing Realtime Crowdsourcing , 2012, ArXiv.

[122]  Aniket Kittur,et al.  CrowdWeaver: visually managing complex crowd work , 2012, CSCW.

[123]  Björn Hartmann,et al.  Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[124]  S. Greenberg,et al.  The Psychology of Everyday Things , 2012 .

[125]  Arjen P. de Vries,et al.  Increasing cheat robustness of crowdsourcing tasks , 2013, Information Retrieval.

[126]  Shipeng Yu,et al.  Ranking annotators for crowdsourced labeling tasks , 2011, NIPS.

[127]  Jacob D. Abernethy,et al.  A Collaborative Mechanism for Crowdsourcing Prediction Problems , 2011, NIPS.

[128]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[129]  Michael S. Bernstein,et al.  Crowds in two seconds: enabling realtime crowd-powered interfaces , 2011, UIST.

[130]  Peng Dai,et al.  Artificial Intelligence for Artificial Artificial Intelligence , 2011, AAAI.

[131]  Gabriella Kazai,et al.  Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.

[132]  Jennifer G. Dy,et al.  Active Learning from Crowds , 2011, ICML.

[133]  Cecilia Ovesdotter Alm Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications , 2011, ACL.

[134]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[135]  Adam Tauman Kalai,et al.  Adaptively Learning the Crowd Kernel , 2011, ICML.

[136]  Erik Brynjolfsson,et al.  Race against the machine : how the digital revolution is accelerating innovation, driving productivity, and irreversibly transforming employment and the economy , 2011 .

[137]  Pietro Perona,et al.  Crowdclustering , 2011, NIPS.

[138]  Jaime G. Carbonell,et al.  Towards Task Recommendation in Micro-Task Markets , 2011, Human Computation.

[139]  Yu-An Sun,et al.  Human OCR: Insights from a Complex Human Computation Process , 2011 .

[140]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[141]  M. Six Silberman,et al.  Ethics and tactics of professional crowdwork , 2010, XRDS.

[142]  Rob Miller,et al.  VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[143]  Lydia B. Chilton,et al.  TurKit: human computation algorithms on mechanical turk , 2010, UIST.

[144]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[145]  Lydia B. Chilton,et al.  Exploring iterative and parallel human computation processes , 2010, HCOMP '10.

[146]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[147]  Peng Dai,et al.  Decision-Theoretic Control of Crowd-Sourced Workflows , 2010, AAAI.

[148]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[149]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[150]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[151]  Milan Vojnovic,et al.  Crowdsourcing and all-pay auctions , 2009, EC '09.

[152]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[153]  Jaime G. Carbonell,et al.  Efficiently learning the accuracy of labeling sources for selective sampling , 2009, KDD.

[154]  Arun Sundararajan,et al.  Optimal Design of Crowdsourcing Contests , 2009, ICIS.

[155]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[156]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[157]  Laura A. Dabbish,et al.  Designing games with a purpose , 2008, CACM.

[158]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[159]  Markus Jakobsson,et al.  Designing ethical phishing experiments , 2007, IEEE Technology and Society Magazine.

[160]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[161]  M. Hansen,et al.  Participatory Sensing , 2019, Internet of Things.

[162]  John D. Lee,et al.  Trust in Automation: Designing for Appropriate Reliance , 2004, Hum. Factors.

[163]  Stephen Marsh,et al.  The role of trust in information science and technology , 2005, Annu. Rev. Inf. Sci. Technol..

[164]  Steffen Staab,et al.  Intelligent Systems for Tourism , 2002, IEEE Intell. Syst..

[165]  D. Wolpert The Supervised Learning No-Free-Lunch Theorems , 2002 .

[166]  B. Moldovanu,et al.  The Optimal Allocation of Prizes in Contests , 2001 .

[167]  William K. Balzer,et al.  Halo and performance appraisal research: A critical examination. , 1992 .

[168]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[169]  L. Kaelbling Learning in embedded systems , 1993 .

[170]  D. Norman The psychology of everyday things", Basic Books Inc , 1988 .

[171]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .