Test Metrics for Recurrent Neural Networks

Recurrent neural networks (RNNs) have been applied to a broad range of application areas such as natural language processing, drug discovery, and video recognition. This paper develops a coverage-guided test framework, including three test metrics and a mutation-based test case generation method, for the validation of a major class of RNNs, i.e., long short-term memory networks (LSTMs). The test metrics are designed with respect to the internal structures of the LSTM layers to quantify the information of the forget gate, the one-step information change of an aggregate hidden state, and the multi-step information evolution of positive and negative aggregate hidden state, respectively. We apply the test framework to several typical LSTM applications, including a network trained on IMDB movie reviews for sentiment analysis, a network trained on MNIST dataset for image classification, and a network trained on a lipophilicity dataset for scientific machine learning. Our experimental results show that the coverage-guided testing can be used to not only extensively exploit the behaviour of the LSTM layer in order to discover the safety loopholes (such as adversarial examples) but also help understand the internal mechanism of how the LSTM layer processes data.

[1]  Xiaoxing Ma,et al.  Structural Coverage Criteria for Neural Networks Could Be Misleading , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER).

[2]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[3]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[4]  Xiaowei Huang,et al.  Reachability Analysis of Deep Neural Networks with Provable Guarantees , 2018, IJCAI.

[5]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[6]  Rob Ashmore,et al.  Requirements Assurance in Machine Learning , 2019, SafeAI@AAAI.

[7]  Swarat Chaudhuri,et al.  AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[8]  James T. Reason,et al.  Managing the risks of organizational accidents , 1997 .

[9]  Moustapha Cissé,et al.  Houdini: Fooling Deep Structured Prediction Models , 2017, ArXiv.

[10]  Lei Ma,et al.  DeepGauge: Comprehensive and Multi-Granularity Testing Criteria for Gauging the Robustness of Deep Learning Systems , 2018, ArXiv.

[11]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[12]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[13]  Yue Zhao,et al.  DLFuzz: differential fuzzing testing of deep learning systems , 2018, ESEC/SIGSOFT FSE.

[14]  Yi Li,et al.  DeepCruiser: Automated Guided Testing for Stateful Deep Learning Systems , 2018, ArXiv.

[15]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[16]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[17]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[18]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[19]  Daniel Kroening,et al.  Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for L0 Norm , 2018, ArXiv.

[20]  Jun Sun,et al.  Detecting Adversarial Samples for Deep Neural Networks through Mutation Testing , 2018, ArXiv.

[21]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[22]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[23]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[24]  Junfeng Yang,et al.  Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems , 2017, ArXiv.

[25]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[26]  Chung-Hao Huang,et al.  Towards Dependability Metrics for Neural Networks , 2018, 2018 16th ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE).

[27]  Sarfraz Khurshid,et al.  Symbolic Execution for Deep Neural Networks , 2018, ArXiv.

[28]  Zhen Li,et al.  Understanding Hidden Memories of Recurrent Neural Networks , 2017, 2017 IEEE Conference on Visual Analytics Science and Technology (VAST).

[29]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[30]  Steven P. Miller,et al.  Applicability of modified condition/decision coverage to software testing , 1994, Softw. Eng. J..

[31]  Jinfeng Yi,et al.  Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples , 2018, AAAI.

[32]  Shin Yoo,et al.  Guiding Deep Learning System Testing Using Surprise Adequacy , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[33]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for NLP , 2017, ArXiv.

[34]  Dejing Dou,et al.  On Adversarial Examples for Character-Level Neural Machine Translation , 2018, COLING.

[35]  Jiaguang Sun,et al.  RNN-Test: Adversarial Testing Framework for Recurrent Neural Network Systems , 2019, ArXiv.

[36]  Biao Huang,et al.  A Long-Short Term Memory Recurrent Neural Network Based Reinforcement Learning Controller for Office Heating Ventilation and Air Conditioning Systems , 2017 .

[37]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[38]  Ian Goodfellow,et al.  TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing , 2018, ICML.

[39]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2017, ICLR.

[40]  Min Wu,et al.  A Game-Based Approximate Verification of Deep Neural Networks with Provable Guarantees , 2018, Theor. Comput. Sci..

[41]  Daniel Kroening,et al.  Concolic Testing for Deep Neural Networks , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[42]  Sarfraz Khurshid,et al.  DeepRoad: GAN-based Metamorphic Autonomous Driving System Testing , 2018, ArXiv.

[43]  Jin Song Dong,et al.  There is Limited Correlation between Coverage and Robustness for Deep Neural Networks , 2019, ArXiv.

[44]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[45]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[46]  Lei Ma,et al.  DeepMutation: Mutation Testing of Deep Learning Systems , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[47]  Qian Yang,et al.  A survey of coverage based testing tools , 2006, AST '06.

[48]  Ananthram Swami,et al.  Crafting adversarial input sequences for recurrent neural networks , 2016, MILCOM 2016 - 2016 IEEE Military Communications Conference.

[49]  Daniel Kroening,et al.  Safety and Trustworthiness of Deep Neural Networks: A Survey , 2018, ArXiv.

[50]  Matthew Hill,et al.  "Boxing Clever": Practical Techniques for Gaining Insights into Training Data and Monitoring Distribution Shift , 2018, SAFECOMP Workshops.

[51]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[52]  Gábor Petneházi,et al.  Recurrent Neural Networks for Time Series Forecasting , 2018, ArXiv.

[53]  Jianjun Zhao,et al.  DeepStellar: model-based quantitative analysis of stateful deep learning systems , 2019, ESEC/SIGSOFT FSE.

[54]  Christian Poellabauer,et al.  Crafting Adversarial Examples For Speech Paralinguistics Applications , 2017, ArXiv.

[55]  Paulo E. Rauber,et al.  Visualizing the Hidden Activity of Artificial Neural Networks , 2017, IEEE Transactions on Visualization and Computer Graphics.