Cross Version Defect Prediction with Class Dependency Embeddings

Software Defect Prediction aims at predicting which software modules are the most probable to contain defects. The idea behind this approach is to save time during the development process by helping find bugs early. Defect Prediction models are based on historical data. Specifically, one can use data collected from past software distributions, or Versions, of the same target application under analysis. Defect Prediction based on past versions is called Cross Version Defect Prediction (CVDP). Traditionally, Static Code Metrics are used to predict defects. In this work, we use the Class Dependency Network (CDN) as another predictor for defects, combined with static code metrics. CDN data contains structural information about the target application being analyzed. Usually, CDN data is analyzed using different handcrafted network measures, like Social Network metrics. Our approach uses network embedding techniques to leverage CDN information without having to build the metrics manually. In order to use the embeddings between versions, we incorporate different embedding alignment techniques. To evaluate our approach, we performed experiments on 24 software release pairs and compared it against several benchmark methods. In these experiments, we analyzed the performance of two different graph embedding techniques, three anchor selection approaches, and two alignment techniques. We also built a meta-model based on two different embeddings and achieved a statistically significant improvement in AUC of 4.7% (p<0.002) over the baseline method.

[1]  Di Cui,et al.  Using K-core Decomposition on Class Dependency Networks to Improve Bug Prediction Model's Practical Performance , 2021, IEEE Transactions on Software Engineering.

[2]  Tao Zhang,et al.  TSTSS: A two-stage training subset selection framework for cross version defect prediction , 2019, J. Syst. Softw..

[3]  Guisheng Fan,et al.  Software Defect Prediction via Attention-Based Recurrent Neural Network , 2019, Sci. Program..

[4]  Sousuke Amasaki,et al.  Cross-Version Defect Prediction using Cross-Project Defect Prediction Approaches: Does it work? , 2018, PROMISE.

[5]  Qinghua Zheng,et al.  node2defect: Using Network Embedding to Improve Software Defect Prediction , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[6]  Wushao Wen,et al.  Ridge and Lasso Regression Models for Cross-Version Defect Prediction , 2018, IEEE Transactions on Reliability.

[7]  Tao Zhang,et al.  Cross Version Defect Prediction with Representative Data via Sparse Subset Selection , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[8]  Tao Zhang,et al.  Cross-version defect prediction via hybrid active learning with kernel principal component analysis , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[9]  Lalita Bhanu Murthy Neti,et al.  Multi-objective cross-version defect prediction , 2018, Soft Comput..

[10]  Akito Monden,et al.  The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[11]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[12]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[13]  Akito Monden,et al.  Empirical Evaluation of Cross-Release Effort-Aware Defect Prediction Models , 2016, 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[14]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[15]  Yasutaka Kamei,et al.  Defect Prediction: Accomplishments and Future Challenges , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[16]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[17]  Deepak Goyal,et al.  A hierarchical model for object-oriented design quality assessment , 2015 .

[18]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[19]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[20]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[21]  Rahul Premraj,et al.  Network Versus Code Metrics to Predict Defects: A Replication Study , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[22]  Marko Bajec,et al.  Community structure of complex software systems: Analysis and applications , 2011, ArXiv.

[23]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[24]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[25]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[26]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[27]  J. Demšar Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[28]  Mei-Hwa Chen,et al.  An empirical study on object-oriented metrics , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[29]  S. Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[30]  Brian Henderson-Sellers,et al.  Object-oriented metrics: measures of complexity , 1995 .

[31]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[32]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[33]  Yuming Zhou,et al.  Empirical analysis of network measures for effort-aware fault-proneness prediction , 2016, Inf. Softw. Technol..

[34]  Dong Wang,et al.  Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[35]  L. Breiman Random Forests , 2001, Machine Learning.

[36]  Carl G. Davis,et al.  A Hierarchical Model for Object-Oriented Design Quality Assessment , 2002, IEEE Trans. Software Eng..

[37]  Robert C. Martin,et al.  OO Design Quality Metrics , 1997 .