Just-in-time defect prediction for Android apps via imbalanced deep learning model

Android mobile apps have played important roles in our daily life and work. To meet new requirements from users, the mobile apps encounter frequent updates, which involves in a large quantity of code commits. Previous studies proposed to apply Just-in-Time (JIT) defect prediction for mobile apps to timely identify whether new code commits can introduce defects into apps, aiming to assure the quality of mobile apps. In general, the number of defective commit instances is much fewer than that of clean ones, in other words, the defect data is class imbalanced. In this work, we propose a novel Imbalanced Deep Learning model, called IDL, to conduct JIT defect prediction task for Android mobile apps. More specifically, we introduce a state-of-the-art cost-sensitive cross-entropy loss function into the deep neural network to learn the high-level feature representation, in which the loss function alleviates the class imbalance issue by taking the prior probability of the two types of classes into account. We conduct experiments on a benchmark defect data consisting of 12 Android mobile apps. The results of rigorous experiments show that our proposed IDL model performs significantly better than 23 comparative imbalanced learning methods in terms of Matthews correlation coefficient performance indicator.

[1]  Hua Wang,et al.  A maximally diversified multiple decision tree algorithm for microarray data classification , 2006 .

[2]  Xinli Yang,et al.  Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[3]  Ahmed E. Hassan,et al.  Fresh apps: an empirical study of frequently-updated mobile apps in the Google play store , 2015, Empirical Software Engineering.

[4]  Ruchika Malhotra,et al.  An empirical framework for defect prediction using machine learning techniques with Android software , 2016, Appl. Soft Comput..

[5]  Yuri Sousa Aurelio,et al.  Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function , 2019, Neural Processing Letters.

[6]  Riccardo Scandariato,et al.  Predicting vulnerable classes in an Android application , 2012, MetriSec '12.

[7]  C. Manjula,et al.  Deep neural network based hybrid approach for software defect prediction using software metrics , 2018, Cluster Computing.

[8]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[9]  Ahmed E. Hassan,et al.  Just-In-Time Defect Identification and Localization: A Two-Phase Framework , 2020, IEEE Transactions on Software Engineering.

[10]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[11]  Qinbao Song,et al.  A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[12]  Michele Marchesi,et al.  Measuring High and Low Priority Defects on Traditional and Mobile Open Source Software , 2016, 2016 IEEE/ACM 7th International Workshop on Emerging Trends in Software Metrics (WETSoM).

[13]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[14]  David Lo,et al.  Effort-aware just-in-time defect identification in practice: a case study at Alibaba , 2020, ESEC/SIGSOFT FSE.

[15]  Md Zahidul Islam,et al.  Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem , 2015, Inf. Syst..

[16]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[17]  Minh Le Nguyen,et al.  Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).

[18]  Gemma Catolino,et al.  Just-In-Time Bug Prediction in Mobile Applications: The Domain Matters! , 2017, 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[19]  Tao Zhang,et al.  Cross Version Defect Prediction with Representative Data via Sparse Subset Selection , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[20]  Gemma Catolino,et al.  Cross-Project Just-in-Time Bug Prediction for Mobile Apps: An Empirical Assessment , 2019, 2019 IEEE/ACM 6th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[21]  Jian Li,et al.  Software Defect Prediction via Convolutional Neural Network , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[22]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[23]  Kamaldeep Kaur,et al.  Application of Machine Learning on Process Metrics for Defect Prediction in Mobile Application , 2016 .

[24]  Xiapu Luo,et al.  LDFR: Learning deep feature representation for software defect prediction , 2019, J. Syst. Softw..

[25]  Foutse Khomh,et al.  Predicting post-release defects using pre-release field testing results , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[26]  Martin Shepperd,et al.  Assessing software defection prediction performance: why using the Matthews correlation coefficient matters , 2020, EASE.

[27]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[28]  Budi Yulianto,et al.  Mobile Application Software Defect Prediction , 2016, 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE).

[29]  Tao Zhang,et al.  A Literature Review of Research in Bug Resolution: Tasks, Challenges and Future Directions , 2016, Comput. J..

[30]  Md Zahidul Islam,et al.  Knowledge Discovery through SysFor - a Systematically Developed Forest of Multiple Decision Trees , 2011, AusDM.

[31]  David Lo,et al.  Cross-project build co-change prediction , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[32]  Xiapu Luo,et al.  A comprehensive comparative study of clustering-based unsupervised defect prediction models , 2021, J. Syst. Softw..

[33]  Tim Menzies,et al.  Class level fault prediction using software clustering , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[34]  Huiqing Liu,et al.  Ensembles of cascading trees , 2003, Third IEEE International Conference on Data Mining.

[35]  Tao Zhang,et al.  Software defect prediction based on kernel PCA and weighted extreme learning machine , 2019, Inf. Softw. Technol..

[36]  Shomona Gracia Jacob,et al.  Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques , 2015 .

[37]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..