Cost-Effective Testing of a Deep Learning Model through Input Reduction

With the increasing adoption of Deep Learning (DL) models in various applications, testing DL models is vitally important. However, testing DL models is costly and expensive, e.g., manual labelling is widely-recognized to be costly. To reduce testing cost, we propose to select only a subset of testing data, which is small but representative enough for a quick estimation of the performance of DL models. Our approach, DeepReduce, adopts a two-phase strategy. At first, our approach selects testing data for the purpose of satisfying testing adequacy. Then, it selects more testing data to approximate the distribution between the whole testing data and the selected data by leveraging relative entropy minimization. We evaluate DeepReduce on four widely-used datasets (with 15 models in total). We find that DeepReduce reduces the whole testing data to 7.5% on average and can reliably estimate the performance of DL models.

[1]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[3]  Jianjun Zhao,et al.  DeepStellar: model-based quantitative analysis of stateful deep learning systems , 2019, ESEC/SIGSOFT FSE.

[4]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[7]  Daniel Kroening,et al.  Testing Deep Neural Networks , 2018, ArXiv.

[8]  Yang Liu,et al.  Towards Characterizing Adversarial Defects of Deep Learning Software from the Lens of Uncertainty , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[9]  Rajiv Gupta,et al.  A methodology for controlling the size of a test suite , 1990, Proceedings. Conference on Software Maintenance 1990.

[10]  Simos Gerasimou,et al.  Importance-Driven Deep Learning System Testing , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[11]  Xiaoxing Ma,et al.  Boosting operational DNN testing efficiency through conditioning , 2019, ESEC/SIGSOFT FSE.

[12]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[13]  Emanuel Melachrinoudis,et al.  Bi-criteria models for all-uses test suite reduction , 2004, Proceedings. 26th International Conference on Software Engineering.

[14]  Ding Li,et al.  Integrated energy-directed test suite optimization , 2014, ISSTA 2014.

[15]  Alex Groce,et al.  Evaluating non-adequate test-case reduction , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ming Yan,et al.  Deep learning library testing via effective model generation , 2020, ESEC/SIGSOFT FSE.

[19]  Atif M. Memon,et al.  A Uniform Representation of Hybrid Criteria for Regression Testing , 2013, IEEE Transactions on Software Engineering.

[20]  Lei Ma,et al.  DeepHunter: a coverage-guided fuzz testing framework for deep neural networks , 2019, ISSTA.

[21]  Shin Yoo,et al.  Guiding Deep Learning System Testing Using Surprise Adequacy , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[22]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[23]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[24]  Lei Ma,et al.  DeepMutation: Mutation Testing of Deep Learning Systems , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[25]  Sarfraz Khurshid,et al.  DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[26]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Wen-Chuan Lee,et al.  MODE: automated neural network model debugging via state differential analysis and input selection , 2018, ESEC/SIGSOFT FSE.

[28]  Tsong Yueh Chen,et al.  Dividing Strategies for the Optimization of a Test Suite , 1996, Inf. Process. Lett..

[29]  Hridesh Rajan,et al.  Repairing Deep Neural Networks: Fix Patterns and Challenges , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[30]  Lu Zhang,et al.  How Do Assertions Impact Coverage-Based Test-Suite Reduction? , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[31]  Daniel Kroening,et al.  Concolic Testing for Deep Neural Networks , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[32]  Gregg Rothermel,et al.  On-demand test suite reduction , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[33]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35]  Xiang Gao,et al.  Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[36]  Wencong Xiao,et al.  An Empirical Study on Program Failures of Deep Learning Jobs , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[37]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[38]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[39]  Xiaoxing Ma,et al.  DISSECTOR: Input Validation for Deep Learning Applications by Crossing-layer Dissection , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[40]  Alireza Sadeghi,et al.  Energy-aware test-suite minimization for Android apps , 2016, ISSTA.

[41]  Lei Ma,et al.  DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[42]  Yang Feng,et al.  DeepGini: Prioritizing Massive Tests to Reduce Labeling Cost , 2019, ArXiv.

[43]  Ian Goodfellow,et al.  TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing , 2018, ICML.