A study of effectiveness of deep learning in locating real faults

Abstract Context: The recent progress of deep learning has shown its promising learning ability in making sense of data, and many fields have utilized this learning ability to learn an effective model, successfully solving their problems. Fault localization has explored and used deep learning to server an aid in debugging, showing the promising results on fault localization. However, as far as we know, there is no detailed studies on evaluating the benefits of using deep learning for locating real faults present in programs. Objective: To understand the benefits of deep learning in locating real faults, this paper explores more about deep learning by studying the effectiveness of fault localization using deep learning for a set of real bugs reported in the widely used programs. Method: We use three representative deep learning architectures (i.e. convolutional neural network, recurrent neural network and multi-layer perceptron) for fault localization, and conduct large-scale experiments on 8 real-world programs equipped with all real faults to evaluate their effectiveness on fault localization. Results: We observe that the localization effectiveness varies considerably among three neural networks in the context of real faults. Specifically, convolutional neural network performs the best in locating real faults, showing an average of 38.97% and 26.22% saving over multi-layer perceptron and recurrent neural network respectively; recurrent neural network and multi-layer perceptron yield comparable effectiveness even if the effectiveness of recurrent neural network is marginally higher than multi-layer perceptron. Conclusion: In context of real faults, convolutional neural network is the most effective for fault localization among the investigated architectures, and we suggest potential factors of deep learning for improving fault localization.

[1]  Xiaofeng Xu,et al.  A Grouping-Based Strategy to Improve the Effectiveness of Fault Localization Techniques , 2010, 2010 10th International Conference on Quality Software.

[2]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[3]  David Lo,et al.  Information retrieval and spectrum based bug localization: better together , 2015, ESEC/SIGSOFT FSE.

[4]  Zhendong Su,et al.  How test suites impact fault localisation starting from the size , 2018, IET Softw..

[5]  Michael D. Ernst,et al.  Evaluating and Improving Fault Localization , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[6]  Peter Zoeteweij,et al.  An Evaluation of Similarity Coefficients for Software Fault Localization , 2006, 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).

[7]  Yuhua Qi,et al.  Slice-based statistical fault localization , 2014, J. Syst. Softw..

[8]  Siau-Cheng Khoo,et al.  Mining succinct predicated bug signatures , 2013, ESEC/FSE 2013.

[9]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Yves Le Traon,et al.  Using Mutants to Locate "Unknown" Faults , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[11]  Jing Wang,et al.  Fault Localization Analysis Based on Deep Neural Network , 2016 .

[12]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[13]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[14]  Lionel C. Briand,et al.  Using Machine Learning to Support Debugging with Tarantula , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[15]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[16]  James A. Jones Fault localization using visualization of test information , 2004, Proceedings. 26th International Conference on Software Engineering.

[17]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[18]  Gregory W. Corder,et al.  Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach , 2009 .

[19]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[20]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[21]  Xiaoguang Mao,et al.  CNN-FL: An Effective Approach for Localizing Faults using Convolutional Neural Networks , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[22]  W. Eric Wong,et al.  Software Fault Localization Using DStar (D*) , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[23]  Marcelo de Almeida Maia,et al.  BEARS: An Extensible Java Bug Benchmark for Automatic Program Repair Studies , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[24]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[25]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[26]  Peter Zoeteweij,et al.  Spectrum-Based Multiple Fault Localization , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[27]  Bhavani M. Thuraisingham,et al.  Effective Software Fault Localization Using an RBF Neural Network , 2012, IEEE Transactions on Reliability.

[28]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[30]  Baowen Xu,et al.  A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization , 2013, TSEM.

[31]  Tsong Yueh Chen,et al.  How well does test case prioritization integrate with statistical fault localization? , 2012, Inf. Softw. Technol..

[32]  Peter Zoeteweij,et al.  A practical evaluation of spectrum-based fault localization , 2009, J. Syst. Softw..

[33]  Lee Naish,et al.  A model for spectra-based software diagnosis , 2011, TSEM.

[34]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[35]  Mark Harman,et al.  Provably Optimal and Human-Competitive Results in SBSE for Spectrum Based Fault Localisation , 2013, SSBSE.

[36]  Wei Li,et al.  DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization , 2019, ISSTA.

[37]  Joseph F. Murray,et al.  Convolutional Networks Can Learn to Generate Affinity Graphs for Image Segmentation , 2010, Neural Computation.

[38]  Xiaoguang Mao,et al.  Deep Learning-Based Fault Localization with Contextual Information , 2017, IEICE Trans. Inf. Syst..

[39]  Michael D. Ernst,et al.  An Empirical Study of Fault Localization Families and Their Combinations , 2018, IEEE Transactions on Software Engineering.

[40]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[41]  Byoungju Choi,et al.  A family of code coverage-based heuristics for effective fault localization , 2010, J. Syst. Softw..

[42]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[43]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2012, Springer Berlin Heidelberg.

[44]  Shin Yoo,et al.  Ask the Mutants: Mutating Faulty Programs for Fault Localization , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[45]  Yves Le Traon,et al.  Metallaxis‐FL: mutation‐based fault localization , 2015, Softw. Test. Verification Reliab..

[46]  Andy Podgurski,et al.  The Probabilistic Program Dependence Graph and Its Application to Fault Diagnosis , 2008, IEEE Transactions on Software Engineering.

[47]  Sergio Segura,et al.  Spectrum-based fault localization in software product lines , 2018, Inf. Softw. Technol..

[48]  Yu Qi,et al.  Bp Neural Network-Based Effective Fault Localization , 2009, Int. J. Softw. Eng. Knowl. Eng..

[49]  Kai-Yuan Cai,et al.  Effective Fault Localization using Code Coverage , 2007, 31st Annual International Computer Software and Applications Conference (COMPSAC 2007).

[50]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[51]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[52]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[53]  Lars Grunske,et al.  An evaluation of pure spectrum‐based fault localization techniques for large‐scale software systems , 2019, Softw. Pract. Exp..