Multiview Transfer Learning for Software Defect Prediction

Most software defect prediction models usually assume that enough historical training instances with labels are available. Additionally, the training data and the predicted instances should share the same features to ensure the prediction accuracy. However, in practice, there are many datasets with different granularities containing information in different dimensions. Therefore, it is valuable to effectively use the small scale and different dimensions of data as training instances to improve the prediction performance of the model. We propose a heterogeneous data orienting multiview transfer learning for software defect prediction, denoted as MTDP, which can achieve different dimensions and granularities features to automatically learn labels through neural network models. With this multiview transfer method, lots of training instances are provided for software defect prediction model to ensure the effectiveness of training labels. The proposed MTDP method has four main stages: 1) build heterogeneous transfer models; 2) transfer heterogeneous instances to generate quasi-real instances; 3) label quasi-real instances through co-training and then expand the training set; and (4) construct improved software defect prediction models. The experimental results show that the quasi-real instances have similar effects compared with real instances. Moreover, the software defect prediction performance can be improved by introducing the quasi-real instances into the training dataset.

[1]  N. Cliff Ordinal methods for behavioral data analysis , 1996 .

[2]  Jongmoon Baik,et al.  A transfer cost-sensitive boosting approach for cross-project defect prediction , 2017, Software Quality Journal.

[3]  David Lo,et al.  Active Semi-supervised Defect Categorization , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[4]  Xiao-Yuan Jing,et al.  Label propagation based semi-supervised learning for software defect prediction , 2016, Automated Software Engineering.

[5]  Bojan Cukic,et al.  A Semi-supervised Approach to Software Defect Prediction , 2014, 2014 IEEE 38th Annual Computer Software and Applications Conference.

[6]  David Lo,et al.  ELBlocker: Predicting blocking bugs with ensemble imbalance learning , 2015, Inf. Softw. Technol..

[7]  Meiyappan Nagappan,et al.  Characterizing and predicting blocking bugs in open source projects , 2018, J. Syst. Softw..

[8]  Jin Liu,et al.  Dictionary learning based software defect prediction , 2014, ICSE.

[9]  David Lo,et al.  Cross-project build co-change prediction , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[10]  Guanghui Wen,et al.  Gaming Temporal Networks , 2019, IEEE Transactions on Circuits and Systems II: Express Briefs.

[11]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[12]  Ivor W. Tsang,et al.  Online Heterogeneous Transfer by Hedge Ensemble of Offline and Online Decisions , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Yi Liu,et al.  Integrated probabilistic modeling method for transient opening height prediction of check valves in oil-gas multiphase pumps , 2018, Adv. Eng. Softw..

[14]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[15]  Rómer Rosales,et al.  Active Sensing , 2009, AISTATS.

[16]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[17]  Qingyao Wu,et al.  Online Transfer Learning with Multiple Homogeneous or Heterogeneous Sources , 2017, IEEE Transactions on Knowledge and Data Engineering.

[18]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[19]  Ying Zou,et al.  Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[20]  Baowen Xu,et al.  An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems , 2017, IEEE Transactions on Software Engineering.

[21]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[22]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[23]  Wen Li,et al.  Semi-Supervised Optimal Transport for Heterogeneous Domain Adaptation , 2018, IJCAI.

[24]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[25]  Xiao-Yuan Jing,et al.  Cross-Project and Within-Project Semisupervised Software Defect Prediction: A Unified Approach , 2018, IEEE Transactions on Reliability.

[26]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[27]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[28]  Yi Liu,et al.  Active learning for modeling and prediction of dynamical fluid processes , 2018 .

[29]  Qi Xuan,et al.  Evolving Convolutional Neural Network and Its Application in Fine-Grained Visual Categorization , 2018, IEEE Access.

[30]  Junghui Chen,et al.  Flame Images for Oxygen Content Prediction of Combustion Systems Using DBN , 2017 .

[31]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[32]  Yi Liu,et al.  Just-in-time semi-supervised soft sensor for quality prediction in industrial rubber mixers , 2018, Chemometrics and Intelligent Laboratory Systems.

[33]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[34]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[35]  Baowen Xu,et al.  Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning , 2015, ESEC/SIGSOFT FSE.

[36]  Chao Yang,et al.  Ensemble deep kernel learning with application to quality prediction in industrial polymerization processes , 2018 .

[37]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[38]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[39]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[40]  Beijun Shen,et al.  Software Defect Prediction Using Semi-Supervised Learning with Change Burst Information , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[41]  Gang Yin,et al.  Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? , 2016, Inf. Softw. Technol..

[42]  Junghui Chen,et al.  Active Selection of Informative Data for Sequential Quality Enhancement of Soft Sensor Models with Latent Variables , 2017 .

[43]  George Michailidis,et al.  A co‐training algorithm for multi‐view data with applications in data fusion , 2009 .

[44]  Qingyao Wu,et al.  Online Heterogeneous Transfer Learning by Weighted Offline and Online Classifiers , 2016, ECCV Workshops.

[45]  Jian Zhang,et al.  Automatic Pearl Classification Machine Based on a Multistream Convolutional Neural Network , 2018, IEEE Transactions on Industrial Electronics.

[46]  HE Ji-Yuan,et al.  Semi-Supervised Ensemble Learning Approach for Cross-Project Defect Prediction , 2017 .

[47]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[48]  Trevor Darrell,et al.  Multi-View Learning in the Presence of View Disagreement , 2008, UAI 2008.

[49]  Hoh Peter In,et al.  Micro interaction metrics for defect prediction , 2011, ESEC/FSE '11.

[50]  Shujuan Jiang,et al.  A feature matching and transfer approach for cross-company defect prediction , 2017, J. Syst. Softw..

[51]  Ming Cheng,et al.  Semi-supervised Software Defect Prediction Using Task-Driven Dictionary Learning , 2016 .

[52]  Tim Menzies,et al.  Heterogeneous Defect Prediction , 2018, IEEE Trans. Software Eng..

[53]  Michael K. Ng,et al.  Learning Discriminative Correlation Subspace for Heterogeneous Domain Adaptation , 2017, IJCAI.

[54]  Taghi M. Khoshgoftaar,et al.  Software quality estimation with limited fault data: a semi-supervised learning perspective , 2007, Software Quality Journal.

[55]  Shunzhi Zhu,et al.  An improved semi-supervised learning method for software defect prediction , 2014, J. Intell. Fuzzy Syst..

[56]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[57]  Xiao-Yuan Jing,et al.  On the Multiple Sources and Privacy Preservation Issues for Heterogeneous Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[58]  Hoh Peter In,et al.  Developer Micro Interaction Metrics for Software Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[59]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[60]  Qi Xuan,et al.  Multiview Generative Adversarial Network and Its Application in Pearl Classification , 2019, IEEE Transactions on Industrial Electronics.

[61]  Cagatay Catal,et al.  A Comparison of Semi-Supervised Classification Approaches for Software Defect Prediction , 2014, J. Intell. Syst..

[62]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.