Effort-aware just-in-time defect identification in practice: a case study at Alibaba

Effort-aware Just-in-Time (JIT) defect identification aims at identifying defect-introducing changes just-in-time with limited code inspection effort. Such identification has two benefits compared with traditional module-level defect identification, i.e., identifying defects in a more cost-effective and efficient manner. Recently, researchers have proposed various effort-aware JIT defect identification approaches, including supervised (e.g., CBS+, OneWay) and unsupervised approaches (e.g., LT and Code Churn). The comparison of the effectiveness between such supervised and unsupervised approaches has attracted a large amount of research interest. However, the effectiveness of the recently proposed approaches and the comparison among them have never been investigated in an industrial setting. In this paper, we investigate the effectiveness of state-of-the-art effort-aware JIT defect identification approaches in an industrial setting. To that end, we conduct a case study on 14 Alibaba projects with 196,790 changes. In our case study, we investigate three aspects: (1) The effectiveness of state-of-the-art supervised (i.e., CBS+,OneWay, EALR) and unsupervised (i.e., LT and Code Churn) effortaware JIT defect identification approaches on Alibaba projects, (2) the importance of the features used in the effort-aware JIT defect identification approach, and (3) the association between projectspecific factors and the likelihood of a defective change. Moreover, we develop a tool based on the best performing approach and investigate the tool's effectiveness in a real-life setting at Alibaba.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  Yuming Zhou,et al.  How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction , 2018, ACM Trans. Softw. Eng. Methodol..

[3]  Osamu Mizuno,et al.  Bug prediction based on fine-grained module histories , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[4]  Naoyasu Ubayashi,et al.  An empirical study of just-in-time defect prediction using cross-project models , 2014, MSR 2014.

[5]  Ahmed E. Hassan,et al.  An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[6]  Sashank Dara,et al.  Online Defect Prediction for Imbalanced Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[7]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[8]  Osamu Mizuno,et al.  Training on errors experiment to detect fault-prone software modules by spam filter , 2007, ESEC-FSE '07.

[9]  Shinichi Nakagawa,et al.  A general and simple method for obtaining R2 from generalized linear mixed‐effects models , 2013 .

[10]  Tim Menzies,et al.  Revisiting unsupervised learning for defect prediction , 2017, ESEC/SIGSOFT FSE.

[11]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[12]  Ying Zou,et al.  Towards just-in-time suggestions for log changes , 2016, Empirical Software Engineering.

[13]  Thomas Fritz,et al.  Software developers' perceptions of productivity , 2014, SIGSOFT FSE.

[14]  Tom A. B. Snijders,et al.  Fixed and random effects. , 2005 .

[15]  Tracy Hall,et al.  Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.

[16]  N. Cliff Ordinal methods for behavioral data analysis , 1996 .

[17]  Chao Liu,et al.  A two-phase transfer learning model for cross-project defect prediction , 2019, Inf. Softw. Technol..

[18]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[19]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[20]  Mollie E. Brooks,et al.  Generalized linear mixed models: a practical guide for ecology and evolution. , 2009, Trends in ecology & evolution.

[21]  Rainer Koschke,et al.  Effort-Aware Defect Prediction Models , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[22]  David Lo,et al.  Chaff from the Wheat: Characterizing and Determining Valid Bug Reports , 2020, IEEE Transactions on Software Engineering.

[23]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[24]  David Lo,et al.  Characterizing and identifying reverted commits , 2019, Empirical Software Engineering.

[25]  Xiaoyan Zhu,et al.  Does bug prediction support human developers? Findings from a Google case study , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[26]  Abdelwahab Hamou-Lhadj,et al.  CLEVER: Combining Code Metrics with Clone Detection for Just-in-Time Fault Prevention and Resolution in Large Industrial Projects , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[27]  Cor-Paul Bezemer,et al.  Studying the dialogue between users and developers of free apps in the Google Play Store , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[28]  David Lo,et al.  Automating Change-Level Self-Admitted Technical Debt Determination , 2019, IEEE Transactions on Software Engineering.

[29]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[30]  A. Scott,et al.  A Cluster Analysis Method for Grouping Means in the Analysis of Variance , 1974 .

[31]  Yasutaka Kamei,et al.  The Impact of Using Regression Models to Build Defect Classifiers , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[32]  Ding Yuan,et al.  How do fixes become bugs? , 2011, ESEC/FSE '11.

[33]  Jacek Czerwonka,et al.  CRANE: Failure Prediction, Change Analysis and Test Prioritization in Practice -- Experiences from Windows , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[34]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[35]  Yuming Zhou,et al.  Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models , 2016, SIGSOFT FSE.

[36]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[37]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[38]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[39]  Xinli Yang,et al.  Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[40]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[41]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[42]  Shane McIntosh,et al.  Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction , 2018, IEEE Transactions on Software Engineering.

[43]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[44]  David Lo,et al.  File-Level Defect Prediction: Unsupervised vs. Supervised Models , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[45]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[46]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[47]  Yuming Zhou,et al.  Code Churn: A Neglected Metric in Effort-Aware Just-in-Time Defect Prediction , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[48]  David Lo,et al.  Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction , 2018, Empirical Software Engineering.

[49]  Paul C. Johnson Extension of Nakagawa & Schielzeth's R2GLMM to random slopes models , 2014, Methods in ecology and evolution.

[50]  David Lo,et al.  Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[51]  D. Stangl,et al.  Encyclopedia of Statistics in Behavioral Science , 2008 .

[52]  Naoyasu Ubayashi,et al.  Studying just-in-time defect prediction using cross-project models , 2015, Empirical Software Engineering.

[53]  Xinli Yang,et al.  TLEL: A two-layer ensemble learning approach for just-in-time defect prediction , 2017, Inf. Softw. Technol..

[54]  Ahmed E. Hassan,et al.  An industrial study on the risk of software changes , 2012, SIGSOFT FSE.

[55]  Zhenchang Xing,et al.  What do developers search for on the web? , 2017, Empirical Software Engineering.

[56]  Shane McIntosh,et al.  The Impact of Automated Parameter Optimization on Defect Prediction Models , 2018, IEEE Transactions on Software Engineering.

[57]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[58]  Audris Mockus,et al.  Towards building a universal defect prediction model , 2014, MSR 2014.

[59]  David Lo,et al.  Perceptions, Expectations, and Challenges in Defect Prediction , 2020, IEEE Transactions on Software Engineering.

[60]  Audris Mockus,et al.  How Does Context Affect the Distribution of Software Maintainability Metrics? , 2013, 2013 IEEE International Conference on Software Maintenance.