Effort-Aware Tri-Training for Semi-supervised Just-in-Time Defect Prediction

In recent years, just-in-time (JIT) defect prediction has gained considerable interest as it enables developers to identify risky changes at check-in time. Previous studies tried to conduct research from both supervised and unsupervised perspectives. Since the label of change is hard to acquire, it would be more desirable for applications if a prediction model doesn’t highly rely on the label information. However, the performance of the unsupervised models proposed by previous work isn’t good in terms of precision and F1 due to the lack of supervised information. To overcome this weakness, we try to study the JIT defect prediction from the semi-supervised perspective, which only requires a few labeled data for training. In this paper, we propose an Effort-Aware Tri-Training (EATT) semi-supervised model for JIT defect prediction based on sample selection. We compare EATT with the state-of-the-art supervised and unsupervised models with respect to different labeled rates. The experimental results on six open-source projects demonstrate that EATT performs better than existing supervised and unsupervised models for effort-aware JIT defect prediction.

[1]  Xiao-Yuan Jing,et al.  Label propagation based semi-supervised learning for software defect prediction , 2016, Automated Software Engineering.

[2]  Zhi-Hua Zhou,et al.  Software defect detection with rocus , 2011 .

[3]  Xinli Yang,et al.  Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[4]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[5]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[6]  Qing Li,et al.  Three-way decisions based software defect prediction , 2016, Knowl. Based Syst..

[7]  Yuming Zhou,et al.  Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models , 2016, SIGSOFT FSE.

[8]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[9]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[11]  Xiao-Yuan Jing,et al.  Progress on approaches to software defect prediction , 2018, IET Softw..

[12]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[13]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[14]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[15]  Bojan Cukic,et al.  Software defect prediction using semi-supervised learning with dimension reduction , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[16]  Bojan Cukic,et al.  An iterative semi-supervised approach to software fault prediction , 2011, Promise '11.

[17]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[18]  Xiang Chen,et al.  MULTI: Multi-objective effort-aware just-in-time software defect prediction , 2018, Inf. Softw. Technol..

[19]  David Lo,et al.  Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[20]  Yuming Zhou,et al.  Code Churn: A Neglected Metric in Effort-Aware Just-in-Time Defect Prediction , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[21]  Naoyasu Ubayashi,et al.  Studying just-in-time defect prediction using cross-project models , 2015, Empirical Software Engineering.

[22]  Tim Menzies,et al.  Revisiting unsupervised learning for defect prediction , 2017, ESEC/SIGSOFT FSE.

[23]  Licheng Jiao,et al.  Semi-Supervised Deep Fuzzy C-Mean Clustering for Software Fault Prediction , 2018, IEEE Access.

[24]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[25]  Zhi-Hua Zhou,et al.  Sample-based software defect prediction with active and semi-supervised learning , 2012, Automated Software Engineering.

[26]  Osamu Mizuno,et al.  Bug prediction based on fine-grained module histories , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[27]  Xinli Yang,et al.  TLEL: A two-layer ensemble learning approach for just-in-time defect prediction , 2017, Inf. Softw. Technol..