论文信息 - Cross-project Defect Prediction via ASTToken2Vec and BLSTM-based Neural Network

Cross-project Defect Prediction via ASTToken2Vec and BLSTM-based Neural Network

Cross-project defect prediction (CPDP) as a means to focus quality assurance of software projects was under heavy investigation in recent years. In this paper, we propose a novel CPDP approach via deep learning. In particular, we model each program module via simplified abstract syntax tree (S-AST). For each node in S-AST, only the project-independent node type is remained and other project-specific information (such as name of variable and method) is ignored, so that the modeling method is project-independent and suitable for CPDP issue. Then we extract token sequences from program modules modeled as S-AST. In addition, to construct meaningful vector representations for token sequences, we propose a novel unsupervised embedding method ASTToken2Vec, which learns semantic information from S-AST’s natural structure. Finally, we use BLSTM (bi-directional long short-term memory) based neural network to automatically learn semantic features from vectorized token sequences and construct CPDP models. In our empirical studies, 10 real large-scale open source Java projects are chosen as our empirical subjects. Final results show that our proposed CPDP approach can perform significantly better than 5 state-of-the-art CPDP baselines in terms of AUC.

[1] Burak Turhan,et al. A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[2] Jian Li,et al. Software Defect Prediction via Convolutional Neural Network , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[3] Ying Zou,et al. Data Transformation in Cross-project Defect Prediction , 2017, Empirical Software Engineering.

[4] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5] Jin Liu,et al. Learning from Imbalanced Data for Predicting the Number of Software Defects , 2017, 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).

[6] Jens Grabowski,et al. A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches , 2018, IEEE Transactions on Software Engineering.

[7] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8] Yang Liu,et al. Proteus: computing disjunctive loop summary via path dependency analysis , 2016, SIGSOFT FSE.

[9] Premkumar T. Devanbu,et al. How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[10] Haruhiko Kaiya,et al. Adapting a fault prediction model to allow inter languagereuse , 2008, PROMISE '08.

[11] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[12] Sinno Jialin Pan,et al. Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] David Lo,et al. HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[15] Bart Baesens,et al. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[16] Xiang Chen,et al. Software defect number prediction: Unsupervised vs supervised methods , 2019, Inf. Softw. Technol..

[17] Bruce Christianson,et al. Software defect prediction using static code metrics underestimates defect-proneness , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[18] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[19] Song Wang,et al. Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[20] Tim Menzies,et al. Better cross company defect prediction , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[21] Ayse Basar Bener,et al. On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[22] Yang Liu,et al. Automatic Loop Summarization via Path Dependency Analysis , 2019, IEEE Transactions on Software Engineering.

[23] Koichiro Ochimizu,et al. Towards logistic regression models for predicting fault-prone code across software projects , 2009, ESEM 2009.

[24] Lech Madeyski,et al. Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[25] Tim Menzies,et al. Local vs. global models for effort estimation and defect prediction , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[26] Jongmoon Baik,et al. Value-cognitive boosting with a support vector machine for cross-project defect prediction , 2014, Empirical Software Engineering.

[27] Tracy Hall,et al. A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[28] Daoxu Chen,et al. A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction , 2017, Journal of Computer Science and Technology.

[29] Qinbao Song,et al. A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[30] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[31] Xinli Yang,et al. Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.