Training Towards Critical Use: Learning to Situate AI Predictions Relative to Human Knowledge

A growing body of research has explored how to support humans in making better use of AI-based decision support, including via training and onboarding. Existing research has focused on decision-making tasks where it is possible to evaluate “appropriate reliance” by comparing each decision against a ground truth label that cleanly maps to both the AI’s predictive target and the human decision-maker’s goals. However, this assumption does not hold in many real-world settings where AI tools are deployed today (e.g., social work, criminal justice, and healthcare). In this paper, we introduce a process-oriented notion of appropriate reliance called critical use that centers the human’s ability to situate AI predictions against knowledge that is uniquely available to them but unavailable to the AI model. To explore how training can support critical use, we conduct a randomized online experiment in a complex social decision-making setting: child maltreatment screening. We find that, by providing participants with accelerated, low-stakes opportunities to practice AI-assisted decision-making in this setting, novices came to exhibit patterns of disagreement with AI that resemble those of experienced workers. A qualitative examination of participants’ explanations for their AI-assisted decisions revealed that they drew upon qualitative case narratives, to which the AI model did not have access, to learn when (not) to rely on AI predictions. Our findings open new questions for the study and design of training for real-world AI-assisted decision-making.

[1]  Solon Barocas,et al.  Against Predictive Optimization: On the Legitimacy of Decision-Making Algorithms that Optimize Predictive Accuracy , 2023, FAccT.

[2]  Zhiwei Steven Wu,et al.  Ground(less) Truth: A Causal Framework for Proxy Labels in Human-Algorithm Decision-Making , 2023, FAccT.

[3]  Kenneth Holstein,et al.  Toward Supporting Perceptual Complementarity in Human-AI Collaboration via Reflection on Unobservables , 2022, Proc. ACM Hum. Comput. Interact..

[4]  Kenneth Holstein,et al.  A Validity Perspective on Evaluating the Justified Use of Data-driven Decision-making Algorithms , 2022, 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML).

[5]  Zhiwei Steven Wu,et al.  “Why Do I Care What’s Similar?” Probing Challenges in AI-Assisted Child Welfare Decision-Making through Worker-AI Interface Design Concepts , 2022, Conference on Designing Interactive Systems.

[6]  G. Satzger,et al.  On the Effect of Information Asymmetry in Human-AI Teams , 2022, ArXiv.

[7]  Scott A. Carter,et al.  You Complete Me: Human-AI Teams and Complementary Expertise , 2022, CHI.

[8]  Zhiwei Steven Wu,et al.  How Child Welfare Workers Reduce Racial Disparities in Algorithmic Decisions , 2022, CHI.

[9]  Kenneth Holstein,et al.  A Unifying Framework for Combining Complementary Strengths of Humans and ML toward Better Predictive Decision-Making , 2022, ArXiv.

[10]  A. Chouldechova,et al.  Heterogeneity in Algorithm-Assisted Decision-Making: A Case Study in Child Abuse Hotline Screening , 2022, Proc. ACM Hum. Comput. Interact..

[11]  Zhiwei Steven Wu,et al.  Improving Human-AI Partnerships in Child Welfare: Understanding Worker Practices, Challenges, and Desires for Algorithmic Decision Support , 2022, CHI.

[12]  Shion Guha,et al.  Unpacking Invisible Work Practices, Constraints, and Latent Power Relationships in Child Welfare through Casenote Analysis , 2022, CHI.

[13]  A. Chouldechova,et al.  Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness , 2022, FAccT.

[14]  Krzysztof Z Gajos,et al.  Do People Engage Cognitively with AI? Impact of AI Assistance on Incidental Learning , 2022, IUI.

[15]  Q. Vera Liao,et al.  Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies , 2021, ArXiv.

[16]  David Sontag,et al.  Teaching Humans When To Defer to a Classifier via Examplars , 2021, AAAI.

[17]  Devansh Saxena,et al.  A Framework of High-Stakes Algorithmic Decision-Making for the Public Sector Developed through a Case Study of Child-Welfare , 2021, Proc. ACM Hum. Comput. Interact..

[18]  Karen Levy,et al.  Algorithms and Decision-Making in the Public Sector , 2021, Annual Review of Law and Social Science.

[19]  Carrie J. Cai,et al.  Onboarding Materials as Cross-functional Boundary Objects for Developing AI Assistants , 2021, CHI Extended Abstracts.

[20]  Zhuoran Lu,et al.  Human Reliance on Machine Learning Models When Performance Feedback is Limited: Heuristics and Risks , 2021, CHI.

[21]  Sendhil Mullainathan,et al.  On the Inequity of Predicting A While Hoping for B , 2021 .

[22]  Kenneth Holstein,et al.  Designing for human-AI complementarity in K-12 education , 2021, AI Mag..

[23]  Zhiwei Steven Wu,et al.  Soliciting Stakeholders’ Fairness Notions in Child Maltreatment Predictive Systems , 2021, CHI.

[24]  Raymond Fok,et al.  Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance , 2020, CHI.

[25]  Vincent Aleven,et al.  A Conceptual Framework for Human–AI Hybrid Adaptivity in Education , 2020, AIED.

[26]  Eric Horvitz,et al.  Learning to Complement Humans , 2020, IJCAI.

[27]  Karla A. Badillo-Urquiola,et al.  A Human-Centered Review of Algorithms used within the U.S. Child Welfare System , 2020, CHI.

[28]  Alexandra Chouldechova,et al.  A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores , 2020, CHI.

[29]  Krzysztof Z. Gajos,et al.  Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems , 2020, IUI.

[30]  Han Liu,et al.  "Why is 'Chicago' deceptive?" Towards Building Model-Driven Tutorials for Humans , 2020, CHI.

[31]  Hanna M. Wallach,et al.  Measurement and Fairness , 2019, FAccT.

[32]  BEN GREEN,et al.  The Principles and Limits of Algorithm-in-the-Loop Decision Making , 2019, Proc. ACM Hum. Comput. Interact..

[33]  Lauren Wilcox,et al.  "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making , 2019, Proc. ACM Hum. Comput. Interact..

[34]  Eric Horvitz,et al.  Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance , 2019, HCOMP.

[35]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[36]  Tina Blegind Jensen,et al.  A systematic review of algorithm aversion in augmented decision making , 2019, Journal of Behavioral Decision Making.

[37]  Ziad Obermeyer,et al.  NBER WORKING PAPER SERIES A MACHINE LEARNING APPROACH TO LOW-VALUE HEALTH CARE: WASTED TESTS, MISSED HEART ATTACKS AND MIS-PREDICTIONS , 2019 .

[38]  Hannah Lebovits Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor , 2018, Public Integrity.

[39]  Isaac S Kohane,et al.  Artificial Intelligence in Healthcare , 2019, Artificial Intelligence and Machine Learning for Business for Non-Engineers.

[40]  Vincent Aleven,et al.  Student Learning Benefits of a Mixed-Reality Teacher Awareness Tool in AI-Enhanced Classrooms , 2018, AIED.

[41]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[42]  Alexandra Chouldechova,et al.  A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions , 2018, FAT.

[43]  J. Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[44]  Kenneth R. Koedinger,et al.  Designing Grounded Feedback: Criteria for Using Linked Representations to Support Learning of Abstract Symbols , 2017, International Journal of Artificial Intelligence in Education.

[45]  G. D’Egidio,et al.  Why Process Quality Measures may be More Valuable than Outcome Measures in Critical Care Patients , 2015 .

[46]  Berkeley J. Dietvorst,et al.  Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err , 2014, Journal of experimental psychology. General.

[47]  Albert T. Corbett,et al.  The Knowledge-Learning-Instruction Framework: Bridging the Science-Practice Chasm to Enhance Robust Student Learning , 2012, Cogn. Sci..

[48]  Vincent Aleven,et al.  Eliciting Intelligent Novice Behaviors with Grounded Feedback in a Fraction Addition Tutor , 2011, AIED.

[49]  T. Gog,et al.  The Effects of Practice Schedule and Critical Thinking Prompts on Learning and Transfer of a Complex Judgment Task. , 2011 .

[50]  Gilan M El Saadawi,et al.  Factors affecting feeling-of-knowing in a medical intelligent tutoring system: the role of immediate feedback as a metacognitive scaffold , 2010, Advances in health sciences education : theory and practice.

[51]  K. Koedinger,et al.  Exploring the Assistance Dilemma in Experiments with Cognitive Tutors , 2007 .

[52]  John D. Lee,et al.  Trust in Automation: Designing for Appropriate Reliance , 2004, Hum. Factors.

[53]  A. Hunter Why Chicago? , 1980 .

[54]  W. McAuliffe Measuring the quality of medical care: process versus outcome. , 1979, The Milbank Memorial Fund quarterly. Health and society.

[55]  Anthony Francis,et al.  Digital Simulations as Approximations of Practice: Preparing Preservice Teachers to Facilitate Whole-Group Discussions of Controversial Issues , 2022 .