Measuring the Quality of Explanations: The System Causability Scale (SCS)

Recent success in Artificial Intelligence (AI) and Machine Learning (ML) allow problem solving automatically without any human intervention. Autonomous approaches can be very convenient. However, in certain domains, e.g., in the medical domain, it is necessary to enable a domain expert to understand, why an algorithm came up with a certain result. Consequently, the field of Explainable AI (xAI) rapidly gained interest worldwide in various domains, particularly in medicine. Explainable AI studies transparency and traceability of opaque AI/ML and there are already a huge variety of methods. For example with layer-wise relevance propagation relevant parts of inputs to, and representations in, a neural network which caused a result, can be highlighted. This is a first important step to ensure that end users, e.g., medical professionals, assume responsibility for decision making with AI/ML and of interest to professionals and regulators. Interactive ML adds the component of human expertise to AI/ML processes by enabling them to re-enact and retrace AI/ML results, e.g. let them check it for plausibility. This requires new human–AI interfaces for explainable AI. In order to build effective and efficient interactive human–AI interfaces we have to deal with the question of how to evaluate the quality of explanations given by an explainable AI system. In this paper we introduce our System Causability Scale to measure the quality of explanations. It is based on our notion of Causability (Holzinger et al. in Wiley Interdiscip Rev Data Min Knowl Discov 9(4), 2019) combined with concepts adapted from a widely-accepted usability scale.

[1]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[2]  Andreas Holzinger,et al.  Interactive machine learning: experimental evidence for the human in the algorithmic loop , 2018, Applied Intelligence.

[3]  André M. Carrington Kernel Methods and Measures for Classification with Transparency, Interpretability and Accuracy in Health Care , 2018 .

[4]  Philip Greenland,et al.  Assessment of Cardiovascular Risk by Use of Multiple-Risk-Factor Assessment Equations , 1999 .

[5]  Andreas Holzinger,et al.  User-Centered Interface Design for Disabled and Elderly People: First Experiences with Designing a Patient Communication System (PACOSY) , 2002, ICCHP.

[6]  Kristian Kersting,et al.  Explanatory Interactive Machine Learning , 2019, AIES.

[7]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[8]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[9]  David C. Kale,et al.  Do no harm: a roadmap for responsible machine learning for health care , 2019, Nature Medicine.

[10]  Andreas Holzinger,et al.  Interactive machine learning for health informatics: when do we need the human-in-the-loop? , 2016, Brain Informatics.

[11]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[12]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[13]  Paul Fieguth,et al.  Statistical Image Processing and Multidimensional Modeling , 2010 .

[14]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[15]  S. Jamieson Likert scales: how to (ab)use them , 2004, Medical education.

[16]  Emily Chen,et al.  How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation , 2018, ArXiv.

[17]  S M Grundy,et al.  Assessment of cardiovascular risk by use of multiple-risk-factor assessment equations: a statement for healthcare professionals from the American Heart Association and the American College of Cardiology. , 1999, Circulation.

[18]  Guido Bologna,et al.  Characterization of Symbolic Rules Embedded in Deep DIMLP Networks: A Challenge to Transparency of Deep Learning , 2017, J. Artif. Intell. Soft Comput. Res..

[19]  Elena Marchiori,et al.  Location Sensitive Deep Convolutional Neural Networks for Segmentation of White Matter Hyperintensities , 2016, Scientific Reports.

[20]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[21]  James T. Miller,et al.  An Empirical Evaluation of the System Usability Scale , 2008, Int. J. Hum. Comput. Interact..

[22]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[23]  Katrien Verbert,et al.  Recommender Systems for Health Informatics: State-of-the-Art and Future Perspectives , 2016, Machine Learning for Health Informatics.

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[26]  R. McPherson,et al.  Recommendations for the management of dyslipidemia and the prevention of cardiovascular disease: summary of the 2003 update. , 2003, CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne.

[27]  Andreas Holzinger,et al.  KANDINSKY Patterns as IQ-Test for Machine Learning , 2019, CD-MAKE.

[28]  Risto Miikkulainen,et al.  Evolving Neural Networks to Play Go , 2004, Applied Intelligence.

[29]  Jeff Sauro,et al.  The Factor Structure of the System Usability Scale , 2009, HCI.

[30]  Georg Langs,et al.  Causability and explainability of artificial intelligence in medicine , 2019, WIREs Data Mining Knowl. Discov..

[31]  Andreas Holzinger,et al.  Importance of medical data preprocessing in predictive modeling and risk factor discovery for the frailty syndrome , 2019, BMC Medical Informatics and Decision Making.

[32]  Hao Chen,et al.  Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge , 2016, Medical Image Anal..

[33]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[34]  A Min Tjoa,et al.  Current Advances, Trends and Challenges of Machine Learning and Knowledge Extraction: From Machine Learning to Explainable AI , 2018, CD-MAKE.

[35]  Ajay Chander,et al.  Evaluating Explanations by Cognitive Value , 2018, CD-MAKE.