Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition

Explainable AI (XAI) techniques have been widely used to help explain and understand the output of deep learning models in fields such as image classification and Natural Language Processing. Interest in using XAI techniques to explain deep learning-based automatic speech recognition (ASR) is emerging. but there is not enough evidence on whether these explanations can be trusted. To address this, we adapt a state-of-the-art XAI technique from the image classification domain, Local Interpretable Model-Agnostic Explanations (LIME), to a model trained for a TIMIT-based phoneme recognition task. This simple task provides a controlled setting for evaluation while also providing expert annotated ground truth to assess the quality of explanations. We find a variant of LIME based on time partitioned audio segments, that we propose in this paper, produces the most reliable explanations, containing the ground truth 96% of the time in its top three audio segments.

[1]  P. Bell,et al.  Explanations for Automatic Speech Recognition , 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Marina M.-C. Höhne,et al.  Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations , 2022, J. Mach. Learn. Res..

[3]  A. Chandar,et al.  Post-hoc Interpretability for Neural NLP: A Survey , 2021, ACM Computing Surveys.

[4]  Serena Booth,et al.  Do Feature Attribution Methods Correctly Attribute Features? , 2021, AAAI.

[5]  Sotiris Kotsiantis,et al.  Explainable AI: A Review of Machine Learning Interpretability Methods , 2020, Entropy.

[6]  Ilaria Liccardi,et al.  Debugging Tests for Model Explanations , 2020, NeurIPS.

[7]  Soheil Feizi,et al.  Benchmarking Deep Learning Interpretability in Time Series Predictions , 2020, NeurIPS.

[8]  R. Aharonov,et al.  A Survey of the State of Explainable AI for Natural Language Processing , 2020, AACL.

[9]  Z. Berkay Celik,et al.  What Do You See?: Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors , 2020, KDD.

[10]  N. Arun,et al.  Assessing the (Un)Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging , 2020, medRxiv.

[11]  Peter A. Flach,et al.  LIMEtree: Interactively Customisable Explanations Based on Local Surrogate Multi-output Regression Trees , 2020, ArXiv.

[12]  W. Samek,et al.  CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations , 2020, Inf. Fusion.

[13]  André M. Carrington,et al.  Measuring the Quality of Explanations: The System Causability Scale (SCS) , 2019, KI - Künstliche Intelligenz.

[14]  Francis M. Tyers,et al.  Common Voice: A Massively-Multilingual Speech Corpus , 2019, LREC.

[15]  Alexander Wong,et al.  Do Explanations Reflect Decisions? A Machine-centric Strategy to Quantify the Performance of Explainability Algorithms , 2019, ArXiv.

[16]  Been Kim,et al.  BIM: Towards Quantitative Evaluation of Interpretability Methods with Ground Truth , 2019, ArXiv.

[17]  Been Kim,et al.  Benchmarking Attribution Methods with Relative Feature Importance , 2019, 1907.09701.

[18]  Naimul Mefraz Khan,et al.  DLIME: A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosis Systems , 2019, ArXiv.

[19]  Klaus-Robert Müller,et al.  Evaluating Recurrent Neural Network Explanations , 2019, BlackboxNLP@ACL.

[20]  Chih-Kuan Yeh,et al.  On the (In)fidelity and Sensitivity for Explanations. , 2019, 1901.09392.

[21]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[22]  Le Song,et al.  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data , 2018, ICLR.

[23]  D. Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[24]  Kate Saenko,et al.  RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[25]  Lalana Kagal,et al.  J un 2 01 8 Explaining Explanations : An Approach to Evaluating Interpretability of Machine Learning , 2018 .

[26]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[27]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[28]  Abhishek Das,et al.  Grad-CAM: Why did you say that? , 2016, ArXiv.

[29]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[30]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[32]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[33]  R. Nenutil,et al.  Explainable Artificial Intelligence for Breast Tumour Classification: Helpful or Harmful , 2022, iMIMIC@MICCAI.

[34]  Megan Kurka,et al.  Machine Learning Interpretability with H2O Driverless AI , 2019 .

[35]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[36]  Marco Tulio Ribeiro,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.