Connecting software metrics across versions to predict defects

Accurate software defect prediction could help software practitioners allocate test resources to defect-prone modules effectively and efficiently. In the last decades, much effort has been devoted to build accurate defect prediction models, including developing quality defect predictors and modeling techniques. However, current widely used defect predictors such as code metrics and process metrics could not well describe how software modules change over the project evolution, which we believe is important for defect prediction. In order to deal with this problem, in this paper, we propose to use the Historical Version Sequence of Metrics (HVSM) in continuous software versions as defect predictors. Furthermore, we leverage Recurrent Neural Network (RNN), a popular modeling technique, to take HVSM as the input to build software prediction models. The experimental results show that, in most cases, the proposed HVSM-based RNN model has significantly better effort-aware ranking effectiveness than the commonly used baseline models.

[1]  Shaomin Wu,et al.  A scored AUC Metric for Classifier Evaluation and Selection , 2005 .

[2]  Jun Wang,et al.  Compressed C4.5 Models for Software Defect Prediction , 2012, 2012 12th International Conference on Quality Software.

[3]  Burak Turhan,et al.  Search Based Training Data Selection For Cross Project Defect Prediction , 2016, PROMISE.

[4]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[5]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[6]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[7]  Gerardo Canfora,et al.  Defect prediction as a multiobjective optimization problem , 2015, Softw. Test. Verification Reliab..

[8]  Nachiappan Nagappan,et al.  Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[9]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[10]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets , 2015, ASE 2015.

[11]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[12]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[13]  A. Scott,et al.  A Cluster Analysis Method for Grouping Means in the Analysis of Variance , 1974 .

[14]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[15]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16]  Jin Liu,et al.  Dictionary learning based software defect prediction , 2014, ICSE.

[17]  E.J. Weyuker,et al.  Using Developer Information as a Factor for Fault Prediction , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[18]  Ahmed E. Hassan,et al.  Prioritizing the devices to test your app on: a case study of Android game apps , 2014, SIGSOFT FSE.

[19]  Peter A. Flach,et al.  Scored AUC Metrics for Classifier Evaluation and Selection , 2005 .

[20]  Martin Pinzger,et al.  Method-level bug prediction , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[21]  Hyunsoo Yoon,et al.  Application of fully recurrent neural networks for speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Yuming Zhou,et al.  An empirical study on dependence clusters for effort-aware fault-proneness prediction , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[24]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[25]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[26]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[27]  Hoh Peter In,et al.  Micro interaction metrics for defect prediction , 2011, ESEC/FSE '11.

[28]  Yuming Zhou,et al.  Empirical analysis of network measures for effort-aware fault-proneness prediction , 2016, Inf. Softw. Technol..

[29]  Harald C. Gall,et al.  A Search-based Training Algorithm for Cost-aware Defect Prediction , 2016, GECCO.

[30]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[31]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[32]  Tao Wang,et al.  Naive Bayes Software Defect Prediction Model , 2010, 2010 International Conference on Computational Intelligence and Software Engineering.

[33]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[34]  Tim Menzies,et al.  Active learning and effort estimation: Finding the essential content of software effort estimation data , 2013, IEEE Transactions on Software Engineering.

[35]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[36]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[37]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[38]  Qinbao Song,et al.  Using Coding-Based Ensemble Learning to Improve Software Defect Prediction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[39]  Shane McIntosh,et al.  An empirical study of the impact of modern code review practices on software quality , 2015, Empirical Software Engineering.

[40]  Ying Zou,et al.  Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[41]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[42]  Deepak Goyal,et al.  A hierarchical model for object-oriented design quality assessment , 2015 .

[43]  Lefteris Angelis,et al.  Ranking and Clustering Software Cost Estimation Models through a Multiple Comparisons Algorithm , 2013, IEEE Transactions on Software Engineering.

[44]  F. Wilcoxon,et al.  Individual comparisons of grouped data by ranking methods. , 1946, Journal of economic entomology.

[45]  L. Erlikh,et al.  Leveraging legacy system dollars for e-business , 2000 .

[46]  Brian Henderson-Sellers,et al.  Object-oriented metrics: measures of complexity , 1995 .

[47]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[48]  Yogesh R. Shepal A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data , 2014 .

[49]  Tim Menzies,et al.  Class level fault prediction using software clustering , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[50]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[51]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[52]  Harald C. Gall,et al.  Don't touch my code!: examining the effects of ownership on software quality , 2011, ESEC/FSE '11.

[53]  Premkumar T. Devanbu,et al.  How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[54]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[55]  Tim Menzies,et al.  Heterogeneous Defect Prediction , 2018, IEEE Trans. Software Eng..

[56]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[57]  Robert C. Martin,et al.  OO Design Quality Metrics , 1997 .

[58]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[59]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[60]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[61]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[62]  Yuming Zhou,et al.  Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models , 2016, SIGSOFT FSE.

[63]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[64]  Tim Menzies,et al.  Learning from Open-Source Projects: An Empirical Study on Defect Prediction , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[65]  Mei-Hwa Chen,et al.  An empirical study on object-oriented metrics , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[66]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.