论文信息 - An Explainable and Actionable Mistrust Scoring Framework for Model Monitoring

An Explainable and Actionable Mistrust Scoring Framework for Model Monitoring

Continuous monitoring of trained ML models to determine when their predictions should and should not be trusted is essential for their safe deployment. Such a framework ought to be high-performing, explainable, post-hoc and actionable. We propose TRUST-LAPSE, a"mistrust"scoring framework for continuous model monitoring. We assess the trustworthiness of each input sample's model prediction using a sequence of latent-space embeddings. Specifically, (a) our latent-space mistrust score estimates mistrust using distance metrics (Mahalanobis distance) and similarity metrics (cosine similarity) in the latent-space and (b) our sequential mistrust score determines deviations in correlations over the sequence of past input representations in a non-parametric, sliding-window based algorithm for actionable continuous monitoring. We evaluate TRUST-LAPSE via two downstream tasks: (1) distributionally shifted input detection, and (2) data drift detection. We evaluate across diverse domains - audio and vision using public datasets and further benchmark our approach on challenging, real-world electroencephalograms (EEG) datasets for seizure detection. Our latent-space mistrust scores achieve state-of-the-art results with AUROCs of 84.1 (vision), 73.9 (audio), and 77.1 (clinical EEGs), outperforming baselines by over 10 points. We expose critical failures in popular baselines that remain insensitive to input semantic content, rendering them unfit for real-world model monitoring. We show that our sequential mistrust scores achieve high drift detection rates; over 90% of the streams show<20% error for all domains. Through extensive qualitative and quantitative evaluations, we show that our mistrust scores are more robust and provide explainability for easy adoption into practice.

D. Rubin | C. Lee-Messer | Nandita Bhaskhar

[1] M. Schaar,et al. Label-Free Explainability for Unsupervised Models , 2022, ICML.

[2] Malte F. Jung,et al. Who is the Expert? Reconciling Algorithm Aversion and Algorithm Appreciation in AI-Supported Decision Making , 2021, Proc. ACM Hum. Comput. Interact..

[3] Prateek Mittal,et al. SSD: A Unified Framework for Self-Supervised Outlier Detection , 2021, ICLR.

[4] Yixuan Li,et al. Energy-based Out-of-distribution Detection , 2020, NeurIPS.

[5] Bilal Alsallakh,et al. Captum: A unified and generic model interpretability library for PyTorch , 2020, ArXiv.

[6] Jinwoo Shin,et al. CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances , 2020, NeurIPS.

[7] Pushmeet Kohli,et al. Contrastive Training for Improved Out-of-Distribution Detection , 2020, ArXiv.

[8] Jared A. Dunnmon,et al. Weak supervision as an efficient approach for automated seizure detection in electroencephalography , 2020, npj Digital Medicine.

[9] Zhangyang Wang,et al. Self-Supervised Learning for Generalizable Out-of-Distribution Detection , 2020, AAAI.

[10] Maria Moreno de Castro,et al. Uncertainty Quantification and Explainable Artificial Intelligence , 2020 .

[11] Yali Amit,et al. Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder , 2020, NeurIPS.

[12] Mohammad Norouzi,et al. Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[13] V. Gómez,et al. Input complexity and out-of-distribution detection with likelihood-based generative models , 2019, ICLR.

[14] Aaron C. Courville,et al. Detecting semantic anomalies , 2019, AAAI.

[15] Dawn Song,et al. Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[16] Jasper Snoek,et al. Likelihood Ratios for Out-of-Distribution Detection , 2019, NeurIPS.

[17] Alexander Binder,et al. Deep Semi-Supervised Anomaly Detection , 2019, ICLR.

[18] R. Barzilay,et al. A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction. , 2019, Radiology.

[19] Ramesh Nallapati,et al. OCGAN: One-Class Novelty Detection Using GANs With Constrained Latent Representations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Joakim Andén,et al. Kymatio: Scattering Transforms in Python , 2018, J. Mach. Learn. Res..

[21] Alex Lamb,et al. Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[22] Stephan Günnemann,et al. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift , 2018, NeurIPS.

[23] Alexander A. Alemi,et al. WAIC, but Why? Generative Ensembles for Robust Anomaly Detection , 2018 .

[24] Thomas G. Dietterich,et al. Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[25] Yee Whye Teh,et al. Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[26] Wenhu Chen,et al. A Variational Dirichlet Framework for Out-of-Distribution Detection , 2018 .

[27] Marius Kloft,et al. Image Anomaly Detection with Generative Adversarial Networks , 2018, ECML/PKDD.

[28] Joseph Keshet,et al. Out-of-Distribution Detection using Multiple Semantic Label Representations , 2018, NeurIPS.

[29] Kibok Lee,et al. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[30] Stanislav Pidhorskyi,et al. Generative Probabilistic Novelty Detection with Adversarial Autoencoders , 2018, NeurIPS.

[31] Maya R. Gupta,et al. To Trust Or Not To Trust A Classifier , 2018, NeurIPS.

[32] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[33] Patrick D. McDaniel,et al. Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning , 2018, ArXiv.

[34] Mark J. F. Gales,et al. Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[35] Kevin Smith,et al. Bayesian Uncertainty Estimation for Batch Normalized Deep Networks , 2018, ICML.

[36] Bo Zong,et al. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[37] Graham W. Taylor,et al. Learning Confidence for Out-of-Distribution Detection in Neural Networks , 2018, ArXiv.

[38] Leland McInnes,et al. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[39] Joseph Picone,et al. The Temple University Hospital Seizure Detection Corpus , 2018, Front. Neuroinform..

[40] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[41] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[42] R. Srikant,et al. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[43] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[44] Diane J. Cook,et al. A survey of methods for time series change point detection , 2017, Knowledge and Information Systems.

[45] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[46] Georg Langs,et al. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery , 2017, IPMI.

[47] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.

[48] Gregory Cohen,et al. EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[49] Sebastian Thrun,et al. Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[50] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[51] Klaus-Robert Müller,et al. Investigating the influence of noise and distractors on the interpretation of neural networks , 2016, ArXiv.

[52] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[53] Wei Dai,et al. Very deep convolutional neural networks for raw waveforms , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[55] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.

[56] Joseph Picone,et al. The Temple University Hospital EEG Data Corpus , 2016, Front. Neurosci..

[57] Marco Tulio Ribeiro,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, HLT-NAACL Demos.

[58] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[60] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[61] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[62] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Sashank J. Reddi,et al. On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions , 2014, AAAI.

[64] A. Bifet,et al. A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[65] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.