Exposing numerical bugs in deep learning via gradient back-propagation

Numerical computation is dominant in deep learning (DL) programs. Consequently, numerical bugs are one of the most prominent kinds of defects in DL programs. Numerical bugs can lead to exceptional values such as NaN (Not-a-Number) and INF (Infinite), which can be propagated and eventually cause crashes or invalid outputs. They occur when special inputs cause invalid parameter values at internal mathematical operations such as log(). In this paper, we propose the first dynamic technique, called GRIST, which automatically generates a small input that can expose numerical bugs in DL programs. GRIST piggy-backs on the built-in gradient computation functionalities of DL infrastructures. Our evaluation on 63 real-world DL programs shows that GRIST detects 78 bugs including 56 unknown bugs. By submitting them to the corresponding issue repositories, eight bugs have been confirmed and three bugs have been fixed. Moreover, GRIST can save 8.79X execution time to expose numerical bugs compared to running original programs with its provided inputs. Compared to the state-of-the-art technique DEBAR (which is a static technique), DEBAR produces 12 false positives and misses 31 true bugs (of which 30 bugs can be found by GRIST), while GRIST only misses one known bug in those programs and no false positive. The results demonstrate the effectiveness of GRIST.

[1]  Yongqiang Tian,et al.  A comprehensive study of deep learning compiler bugs , 2021, ESEC/SIGSOFT FSE.

[2]  Xuyuan Dong,et al.  Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[3]  Xuyuan Dong,et al.  Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[4]  Liqian Chen,et al.  Detecting numerical bugs in neural network architectures , 2020, ESEC/SIGSOFT FSE.

[5]  Ming Yan,et al.  Deep learning library testing via effective model generation , 2020, ESEC/SIGSOFT FSE.

[6]  Miryung Kim,et al.  Is neuron coverage a meaningful measure for testing deep neural networks? , 2020, ESEC/SIGSOFT FSE.

[7]  Lingming Zhang,et al.  Practical Accuracy Estimation for Efficient Deep Neural Network Testing , 2020, ACM Trans. Softw. Eng. Methodol..

[8]  Junjie Chen,et al.  Enhanced Compiler Bug Isolation via Memoized Search , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9]  Junjie Chen,et al.  How Incidental are the Incidents? Characterizing and Prioritizing Incidents for Large-Scale Online Service Systems , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  Hui Guo,et al.  Efficient Generation of Error-Inducing Floating-Point Inputs via Symbolic Execution , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[11]  Wencong Xiao,et al.  An Empirical Study on Program Failures of Deep Learning Jobs , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[12]  Hridesh Rajan,et al.  Repairing Deep Neural Networks: Fix Patterns and Challenges , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[13]  Lei Ma,et al.  A Quantitative Analysis Framework for Recurrent Neural Network , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[14]  Gabriele Bavota,et al.  Taxonomy of Real Faults in Deep Learning Systems , 2019, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[15]  Michael R. Lyu,et al.  An Empirical Study of Common Challenges in Developing Deep Learning Applications , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).

[16]  Junjie Chen,et al.  Continuous Incident Triage for Large-Scale Online Service Systems , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[17]  Foutse Khomh,et al.  DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks , 2019, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[18]  Lei Ma,et al.  DeepHunter: a coverage-guided fuzz testing framework for deep neural networks , 2019, ISSTA.

[19]  Foutse Khomh,et al.  TFCheck : A TensorFlow Library for Detecting Training Issues in Neural Network Programs , 2019, 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS).

[20]  Zhendong Su,et al.  Effective floating-point analysis via weak-distance minimization , 2019, PLDI.

[21]  Hridesh Rajan,et al.  A comprehensive study on deep learning bug characteristics , 2019, ESEC/SIGSOFT FSE.

[22]  Lin Tan,et al.  CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[23]  Lei Ma,et al.  DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[24]  Michael Pradel,et al.  How Many of All Bugs Do We Find? A Study of Static Bug Detectors , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[25]  Yue Zhao,et al.  DLFuzz: differential fuzzing testing of deep learning systems , 2018, ESEC/SIGSOFT FSE.

[26]  Shin Yoo,et al.  Guiding Deep Learning System Testing Using Surprise Adequacy , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[27]  Yifan Chen,et al.  An empirical study on TensorFlow program bugs , 2018, ISSTA.

[28]  Ian Goodfellow,et al.  TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing , 2018, ICML.

[29]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[30]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[31]  Cho-Jui Hsieh,et al.  Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[32]  Anthony Di Franco,et al.  A comprehensive study of real-world numerical bug characteristics , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[33]  Xiangyu Zhang,et al.  Software Numerical Instability Detection and Diagnosis by Combining Stochastic and Infinite-Precision Testing , 2017, IEEE Transactions on Software Engineering.

[34]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[35]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[36]  Zhendong Su,et al.  Achieving high coverage for floating-point code via unconstrained programming , 2017, PLDI.

[37]  Christoph Lassner,et al.  Early Stopping without a Validation Set , 2017, ArXiv.

[38]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[39]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[40]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[41]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[43]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[44]  Mark Lillibridge,et al.  PLDI 2002: Extended static checking for Java , 2013, SIGP.

[45]  Premkumar T. Devanbu,et al.  To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[46]  J. D. Morgenthaler,et al.  Using Static Analysis to Find Bugs , 2008, IEEE Software.

[47]  Greg Nelson,et al.  Extended static checking for Java , 2002, PLDI '02.

[48]  Louis B. Rall,et al.  Automatic Differentiation: Techniques and Applications , 1981, Lecture Notes in Computer Science.

[49]  Md. Sazzad Hossien Chowdhury,et al.  Calculus with single variable , 2011 .

[50]  Lutz Prechelt,et al.  Early Stopping-But When? , 1996, Neural Networks: Tricks of the Trade.