Explainable Active Learning (XAL)

The wide adoption of Machine Learning (ML) technologies has created a growing demand for people who can train ML models. Some advocated the term "machine teacher'' to refer to the role of people who inject domain knowledge into ML models. This "teaching'' perspective emphasizes supporting the productivity and mental wellbeing of machine teachers through efficient learning algorithms and thoughtful design of human-AI interfaces. One promising learning paradigm is Active Learning (AL), by which the model intelligently selects instances to query a machine teacher for labels, so that the labeling workload could be largely reduced. However, in current AL settings, the human-AI interface remains minimal and opaque. A dearth of empirical studies further hinders us from developing teacher-friendly interfaces for AL algorithms. In this work, we begin considering AI explanations as a core element of the human-AI interface for teaching machines. When a human student learns, it is a common pattern to present one's own reasoning and solicit feedback from the teacher. When a ML model learns and still makes mistakes, the teacher ought to be able to understand the reasoning underlying its mistakes. When the model matures, the teacher should be able to recognize its progress in order to trust and feel confident about their teaching outcome. Toward this vision, we propose a novel paradigm of explainable active learning (XAL), by introducing techniques from the surging field of explainable AI (XAI) into an AL setting. We conducted an empirical study comparing the model learning outcomes, feedback content and experience with XAL, to that of traditional AL and coactive learning (providing the model's prediction without explanation). Our study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and additional cognitive workload. Our study also reveals important individual factors that mediate a machine teacher's reception to AI explanations, including task knowledge, AI experience and Need for Cognition. By reflecting on the results, we suggest future directions and design implications for XAL, and more broadly, machine teaching through AI explanations.

[1]  Maya Cakmak,et al.  Designing Interactions for Robot Active Learners , 2010, IEEE Transactions on Autonomous Mental Development.

[2]  Hema Raghavan,et al.  Active Learning with Feedback on Features and Instances , 2006, J. Mach. Learn. Res..

[3]  K. Holyoak,et al.  The Oxford handbook of thinking and reasoning , 2012 .

[4]  Weng-Keen Wong,et al.  Too much, too little, or just right? Ways explanations impact end users' mental models , 2013, 2013 IEEE Symposium on Visual Languages and Human Centric Computing.

[5]  Marco Loog,et al.  A benchmark and comparison of active learning for logistic regression , 2016, Pattern Recognit..

[6]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[7]  Martin Wattenberg,et al.  The What-If Tool: Interactive Probing of Machine Learning Models , 2019, IEEE Transactions on Visualization and Computer Graphics.

[8]  Enrico Bertini,et al.  INFUSE: Interactive Feature Selection for Predictive Modeling of High Dimensional Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[9]  Emily Chen,et al.  How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation , 2018, ArXiv.

[10]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[11]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[12]  J. Cacioppo,et al.  Effects of need for cognition on message evaluation, recall, and persuasion. , 1983 .

[13]  Alun D. Preece,et al.  Interpretable to Whom? A Role-based Model for Analyzing Interpretable Machine Learning Systems , 2018, ArXiv.

[14]  Weng-Keen Wong,et al.  Why-oriented end-user debugging of naive Bayes text classification , 2011, ACM Trans. Interact. Intell. Syst..

[15]  James R. Lewis,et al.  IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use , 1995, Int. J. Hum. Comput. Interact..

[16]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[17]  Curtis P. Haugtvedt,et al.  Personality and Persuasion : Need for Cognition Moderates the Persistence and Resistance of Attitude Changes , 2004 .

[18]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[19]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[20]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[21]  T. Lombrozo Explanation and Abductive Inference , 2012 .

[22]  Jordan L. Boyd-Graber,et al.  Closing the Loop: User-Centered Design and Evaluation of a Human-in-the-Loop Topic Modeling System , 2018, IUI.

[23]  Jaime G. Carbonell,et al.  Proactive learning: cost-sensitive active learning with multiple imperfect oracles , 2008, CIKM '08.

[24]  Subbarao Kambhampati,et al.  The Emerging Landscape of Explainable Automated Planning & Decision Making , 2020, IJCAI.

[25]  Peter Brusilovsky,et al.  Open user profiles for adaptive news systems: help or harm? , 2007, WWW '07.

[26]  Maya Cakmak,et al.  Designing robot learners that ask good questions , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[27]  Karthik S. Gurumoorthy,et al.  ProtoDash: Fast Interpretable Prototype Selection , 2017, ArXiv.

[28]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[29]  Yunfeng Zhang,et al.  Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making , 2020, FAT*.

[30]  Krzysztof Z. Gajos,et al.  Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems , 2020, IUI.

[31]  Moritz Körber,et al.  Theoretical Considerations and Development of a Questionnaire to Measure Trust in Automation , 2018, Advances in Intelligent Systems and Computing.

[32]  Vivian Lai,et al.  On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection , 2018, FAT.

[33]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[34]  Yang Gao,et al.  Active learning with confidence-based answers for crowdsourcing labeling tasks , 2018, Knowl. Based Syst..

[35]  Zhiqiang Zheng,et al.  On active learning for data acquisition , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[36]  David Maxwell Chickering,et al.  Machine Teaching: A New Paradigm for Building Machine Learning Systems , 2017, ArXiv.

[37]  Michael S. Bernstein,et al.  Measuring Crowdsourcing Effort with Error-Time Curves , 2015, CHI.

[38]  Paul N. Bennett,et al.  Will You Accept an Imperfect AI?: Exploring Designs for Adjusting End-user Expectations of AI Systems , 2019, CHI.

[39]  Anind K. Dey,et al.  Toolkit to support intelligibility in context-aware applications , 2010, UbiComp.

[40]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[41]  Lee Lacy,et al.  Defense Advanced Research Projects Agency (DARPA) Agent Markup Language Computer Aided Knowledge Acquisition , 2005 .

[42]  Desney S. Tan,et al.  Interactive optimization for steering machine classification , 2010, CHI.

[43]  Bongshin Lee,et al.  Squares: Supporting Interactive Performance Analysis for Multiclass Classifiers , 2017, IEEE Transactions on Visualization and Computer Graphics.

[44]  Thomas G. Dietterich,et al.  Toward harnessing user feedback for machine learning , 2007, IUI '07.

[45]  Charles J. Kacmar,et al.  Developing and Validating Trust Measures for e-Commerce: An Integrative Typology , 2002, Inf. Syst. Res..

[46]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[47]  Haiyi Zhu,et al.  Explaining Decision-Making Algorithms through UI: Strategies to Help Non-Expert Stakeholders , 2019, CHI.

[48]  Burr Settles,et al.  Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances , 2011, EMNLP.

[49]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[50]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[51]  Jaime S. Cardoso,et al.  Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.

[52]  Steve Whittaker,et al.  Progressive disclosure: empirically motivated approaches to designing effective transparency , 2019, IUI.

[53]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[54]  Thorsten Joachims,et al.  Coactive Learning , 2015, J. Artif. Intell. Res..

[55]  Desney S. Tan,et al.  CueFlik: interactive concept learning in image search , 2008, CHI.

[56]  R. Tibshirani,et al.  Prototype selection for interpretable classification , 2011, 1202.5933.

[57]  Jeffrey Heer,et al.  Local Decision Pitfalls in Interactive Machine Learning , 2019, ACM Trans. Comput. Hum. Interact..

[58]  N. L. Chervany,et al.  Initial Trust Formation in New Organizational Relationships , 1998 .

[59]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[60]  Henry M. Wellman,et al.  Theory of Mind for Learning and Teaching: The Nature and Role of Explanation. , 2004 .

[61]  Carla E. Brodley,et al.  Active Class Selection , 2007, ECML.

[62]  Thomas G. Dietterich,et al.  Interacting meaningfully with machine learning systems: Three experiments , 2009, Int. J. Hum. Comput. Stud..

[63]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[64]  Qian Yang,et al.  Designing Theory-Driven User-Centric Explainable AI , 2019, CHI.

[65]  Peter A. Flach,et al.  Explainability fact sheets: a framework for systematic assessment of explainable approaches , 2019, FAT*.

[66]  Yang Wang,et al.  Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[67]  Rachel K. E. Bellamy,et al.  Explaining models an empirical study of how explanations impact fairness judgment , 2019 .

[68]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[69]  Avi Rosenfeld,et al.  Explainability in human–agent systems , 2019, Autonomous Agents and Multi-Agent Systems.

[70]  David A. Anisi,et al.  Optimal Motion Control of a Ground Vehicle , 2003 .

[71]  Weng-Keen Wong,et al.  Principles of Explanatory Debugging to Personalize Interactive Machine Learning , 2015, IUI.

[72]  Yunyao Li,et al.  SEER: Auto-Generating Information Extraction Rules from User-Specified Examples , 2017, CHI.

[73]  Michael Bloodgood,et al.  Impact of Batch Size on Stopping Active Learning for Text Classification , 2018, 2018 IEEE 12th International Conference on Semantic Computing (ICSC).

[74]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[75]  María Malfaz,et al.  Asking Rank Queries in Pose Learning , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[76]  Anind K. Dey,et al.  Why and why not explanations improve the intelligibility of context-aware intelligent systems , 2009, CHI.

[77]  Alexandre Bernardino,et al.  Generation of meaningful robot expressions with active learning , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[78]  Daniel Gruen,et al.  Questioning the AI: Informing Design Practices for Explainable AI User Experiences , 2020, CHI.

[79]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Maya Cakmak,et al.  Transparent active learning for robots , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[81]  Daniel S. Weld,et al.  No Explainability without Accountability: An Empirical Study of Explanations and Feedback in Interactive ML , 2020, CHI.

[82]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[83]  Karen Meyer,et al.  Consensually Driven Explanation in Science Teaching. , 1997 .

[84]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[85]  Amit Dhurandhar,et al.  One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques , 2019, ArXiv.

[86]  Qian Yang,et al.  Why these Explanations? Selecting Intelligibility Types for Explanation Goals , 2019, IUI Workshops.

[87]  Vanda Broughton,et al.  Sage Dictionary of Statistics: A Practical Resource for Students in the Social Sciences , 2005 .

[88]  Paul A. Cairns,et al.  A practical approach to measuring user engagement with the refined user engagement scale (UES) and new UES short form , 2018, Int. J. Hum. Comput. Stud..

[89]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[90]  Ashish Kapoor,et al.  FeatureInsight: Visual support for error-driven feature ideation in text classification , 2015, 2015 IEEE Conference on Visual Analytics Science and Technology (VAST).

[91]  Steven M. Drucker,et al.  Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models , 2019, CHI.

[92]  Kristian Kersting,et al.  "Why Should I Trust Interactive Learners?" Explaining Interactive Queries of Classifiers to Users , 2018, ArXiv.

[93]  J. Cacioppo,et al.  The need for cognition. , 1982 .

[94]  Andrew McCallum,et al.  Active Learning by Labeling Features , 2009, EMNLP.

[95]  Stephanie Rosenthal,et al.  Towards maximizing the accuracy of human-labeled sensor data , 2010, IUI '10.

[96]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[97]  Eric D. Ragan,et al.  A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems , 2018, ACM Trans. Interact. Intell. Syst..

[98]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.