Supporting DNN Safety Analysis and Retraining through Heatmap-based Unsupervised Learning

Deep neural networks (DNNs) are increasingly critical in modern safety-critical systems, for example in their perception layer to analyze images. Unfortunately, there is a lack of methods to ensure the functional safety of DNN-based components. The machine learning literature suggests one should trust DNNs demonstrating high accuracy on test sets. In case of low accuracy, DNNs should be retrained using additional inputs similar to the error-inducing ones. We observe two major challenges with existing practices for safety-critical systems: (1) scenarios that are underrepresented in the test set may represent serious risks, which may lead to safety violations, and may not be noticed; (2) debugging DNNs is poorly supported when error causes are difficult to visually detect. To address these problems, we propose HUDD, an approach that automatically supports the identification of root causes for DNN errors. We automatically group error-inducing images whose results are due to common subsets of selected DNN neurons. HUDD identifies root causes by applying a clustering algorithm to matrices (i.e., heatmaps) capturing the relevance of every DNN neuron on the DNN outcome. Also, HUDD retrains DNNs with images that are automatically selected based on their relatedness to the identified image clusters. We have evaluated HUDD with DNNs from the automotive domain. The approach was able to automatically identify all the distinct root causes of DNN errors, thus supporting safety analysis. Also, our retraining approach has shown to be more effective at improving DNN accuracy than existing approaches.

[1]  Wen-Chuan Lee,et al.  MODE: automated neural network model debugging via state differential analysis and input selection , 2018, ESEC/SIGSOFT FSE.

[2]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Alper Sen,et al.  DeepFault: Fault Localization for Deep Neural Networks , 2019, FASE.

[4]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Xiang Gao,et al.  Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[6]  Jeffrey Byrne,et al.  Visualizing and Quantifying Discriminative Features for Face Recognition , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[7]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[8]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[9]  Yale Song,et al.  Fast, Cheap, and Good: Why Animated GIFs Engage Us , 2016, CHI.

[10]  David E. Irwin,et al.  Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[11]  Anima Anandkumar,et al.  Deep Active Learning for Named Entity Recognition , 2017, Rep4NLP@ACL.

[12]  Gabriele Bavota,et al.  Taxonomy of Real Faults in Deep Learning Systems , 2019, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[13]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[14]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[15]  Klaus-Robert Müller,et al.  Layer-Wise Relevance Propagation: An Overview , 2019, Explainable AI.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[18]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  M. J. van der Laan,et al.  A new partitioning around medoids algorithm , 2003 .

[20]  Peter Robinson,et al.  Learning an appearance-based gaze estimator from one million synthesised images , 2016, ETRA.

[21]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[22]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[23]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[24]  B. Fornberg Generation of finite difference formulas on arbitrarily spaced grids , 1988 .

[25]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[26]  Kang Ryoung Park,et al.  Deep Learning-Based Gaze Detection System for Automobile Drivers Using a NIR Camera Sensor , 2018, Sensors.

[27]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[28]  Shin Yoo,et al.  Guiding Deep Learning System Testing Using Surprise Adequacy , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[29]  Lukas Jendele,et al.  Efficient Automated Decomposition of Build Targets at Large-Scale , 2019, 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST).

[30]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[31]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[32]  Ronald S. King,et al.  Cluster Analysis and Data Mining: An Introduction , 2014 .

[33]  Kaixin Sui,et al.  Generic and Robust Localization of Multi-dimensional Root Causes , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).

[34]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[35]  Hao Zhang,et al.  Apricot: A Weight-Adaptation Approach to Fixing Deep Learning Models , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[36]  Kate Saenko,et al.  RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[37]  R. L. Thorndike Who belongs in the family? , 1953 .

[38]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[39]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Divya Gopinath,et al.  Property Inference for Deep Neural Networks , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[41]  Daniel Kroening,et al.  A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability , 2018, Comput. Sci. Rev..

[42]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[43]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[44]  Jim Tørresen,et al.  A task-and-technique centered survey on visual analytics for deep learning model engineering , 2018, Comput. Graph..

[45]  Mark Harman,et al.  Machine Learning Testing: Survey, Landscapes and Horizons , 2019, IEEE Transactions on Software Engineering.

[46]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..