Software Defect Prediction Based on Gated Hierarchical LSTMs

Software defect prediction, aimed at assisting software practitioners in allocating test resources more efficiently, predicts the potential defective modules in software products. With the development of defect prediction technology, the inability of traditional software features to capture semantic information is exposed, hence related researchers have turned to semantic features to build defect prediction models. However, sometimes traditional features such as lines of code (LOC) also play an important role in defect prediction. Most of the existing researches only focus on using a single type of feature as the input of the model. In this article, a defect prediction method based on gated hierarchical long short-term memory networks (GH-LSTMs) is proposed, which uses hierarchical LSTM networks to extract both semantic features from word embeddings of abstract syntax trees (ASTs) of source code files, and traditional features provided by the PROMISE repository. More importantly, we adopt a gated fusion strategy to combine the outputs of the hierarchical networks properly. Experimental results show that GH-LSTMs outperforms existing methods under both noneffort-aware and effort-aware scenarios.

[1]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Jin Liu,et al.  Dictionary learning based software defect prediction , 2014, ICSE.

[5]  Yue Yu,et al.  Seml: A Semantic LSTM Model for Software Defect Prediction , 2019, IEEE Access.

[6]  Meng Liu,et al.  Do different cross‐project defect prediction methods identify the same defective modules? , 2019, J. Softw. Evol. Process..

[7]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[8]  David Lo,et al.  Practitioners' expectations on automated fault localization , 2016, ISSTA.

[9]  Baowen Xu,et al.  A Gated Hierarchical LSTMs for Target-based Sentiment Analysis , 2018, SEKE.

[10]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[11]  Xinli Yang,et al.  Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[12]  Yuming Zhou,et al.  How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction , 2018, ACM Trans. Softw. Eng. Methodol..

[13]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[14]  Yuxiang Shen,et al.  An empirical study on pareto based multi-objective feature selection for software defect prediction , 2019, J. Syst. Softw..

[15]  Jongmoon Baik,et al.  Value-cognitive boosting with a support vector machine for cross-project defect prediction , 2014, Empirical Software Engineering.

[16]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[17]  Jaechang Nam,et al.  Deep Semantic Feature Learning for Software Defect Prediction , 2020, IEEE Transactions on Software Engineering.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  David Lo,et al.  Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction , 2018, Empirical Software Engineering.

[20]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[21]  Wei Hua,et al.  FCCA: Hybrid Code Representation for Functional Clone Detection Using Attention Networks , 2020, IEEE Transactions on Reliability.

[22]  Guisheng Fan,et al.  Deep Semantic Feature Learning with Embedded Static Metrics for Software Defect Prediction , 2019, 2019 26th Asia-Pacific Software Engineering Conference (APSEC).

[23]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[24]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[25]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[26]  Yuming Zhou,et al.  Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models , 2016, SIGSOFT FSE.

[27]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[28]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[29]  Ming Li,et al.  Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code , 2017, IJCAI.

[30]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[31]  Akito Monden,et al.  The Effects of Over and Under Sampling on Fault-prone Module Detection , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[32]  Andreas Zeller,et al.  Change Bursts as Defect Predictors , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[33]  Jian Li,et al.  Software Defect Prediction via Convolutional Neural Network , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[34]  Frances E. Allen,et al.  Control-flow analysis , 2022 .

[35]  Yuming Zhou,et al.  Connecting software metrics across versions to predict defects , 2017, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[36]  Tim Menzies,et al.  Learning from Open-Source Projects: An Empirical Study on Defect Prediction , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[37]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[38]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[39]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[40]  David Lo,et al.  Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[41]  Cristina V. Lopes,et al.  SourcererCC and SourcererCC-I: Tools to Detect Clones in Batch Mode and during Software Development , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[42]  Rainer Koschke,et al.  Effort-Aware Defect Prediction Models , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[43]  Minh Le Nguyen,et al.  Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).