Leveraging developer information for efficient effort-aware bug prediction

Abstract Context: Software bug prediction techniques can provide informative guidance in software engineering practices. Over the past 15 years, developer information has been intensively used in bug prediction as features or basic data source to construct other useful models. Objective: Further leverage developer information from a new and straightforward perspective to improve effort-aware bug prediction. Methods: We propose to investigate the direct relations between the number of developers and the probability for a file to be buggy. Based on an empirical study on nine open-source Java systems with 32 versions, we observe a widely-existed and interesting tendency: when there are more developers working on a source file, there will be a stronger possibility for this file to be buggy. Based on the observed tendency, we propose an unsupervised algorithm and a supervised equation both called top-dev to improve effort-aware bug prediction. The key idea is to prioritize the ranking of files, whose number of developers is large, in the suspicious file list generated by effort-aware models. Results: Experimental results show that the proposed top-dev algorithm and equation significantly outperform the unsupervised and supervised baseline models (ManualUp,  R a d ,  R d d ,  R e e , CBS+, and top-core). Moreover, the unsupervised top-dev algorithm is comparable or superior to existing supervised baseline models. Conclusion: The proposed approaches are very useful in effort-aware bug prediction practices. Practitioners can use the top-dev algorithm to generate a high-quality and informative suspicious file list without training complex machine learning classifiers. On the other hand, when building supervised bug prediction model, the best practice is to combine existing models with the top-dev equation.

[1]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[2]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[3]  David Lo,et al.  File-Level Defect Prediction: Unsupervised vs. Supervised Models , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[4]  Laurie A. Williams,et al.  Predicting failures with developer networks and social network analysis , 2008, SIGSOFT '08/FSE-16.

[5]  Thomas Fritz,et al.  Software developers' perceptions of productivity , 2014, SIGSOFT FSE.

[6]  Shane McIntosh,et al.  The Impact of Automated Parameter Optimization on Defect Prediction Models , 2018, IEEE Transactions on Software Engineering.

[7]  Akito Monden,et al.  Revisiting common bug prediction findings using effort-aware models , 2010, 2010 IEEE International Conference on Software Maintenance.

[8]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[9]  Elaine J. Weyuker,et al.  Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models , 2008, Empirical Software Engineering.

[10]  Tim Menzies,et al.  Revisiting unsupervised learning for defect prediction , 2017, ESEC/SIGSOFT FSE.

[11]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[12]  Xin Peng,et al.  Assessing Software Quality by Program Clustering and Defect Prediction , 2011, 2011 18th Working Conference on Reverse Engineering.

[13]  Yuming Zhou,et al.  Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study , 2015, IEEE Transactions on Software Engineering.

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  Zan Wang,et al.  Large-Scale Empirical Studies on Effort-Aware Security Vulnerability Prediction Methods , 2020, IEEE Transactions on Reliability.

[16]  David Lo,et al.  Revisiting Supervised and Unsupervised Methods for Effort-Aware Cross-Project Defect Prediction , 2020, IEEE Transactions on Software Engineering.

[17]  Elaine J. Weyuker,et al.  The limited impact of individual developer data on software defect prediction , 2011, Empirical Software Engineering.

[18]  Harald C. Gall,et al.  Don't touch my code!: examining the effects of ownership on software quality , 2011, ESEC/FSE '11.

[19]  Di Cui,et al.  Using K-core Decomposition on Class Dependency Networks to Improve Bug Prediction Model's Practical Performance , 2021, IEEE Transactions on Software Engineering.

[20]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[21]  Yuming Zhou,et al.  Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models , 2016, SIGSOFT FSE.

[22]  Hao Chen,et al.  Deep Learning for Source Code Modeling and Generation , 2020, ACM Comput. Surv..

[23]  Marko Bajec,et al.  Community structure of complex software systems: Analysis and applications , 2011, ArXiv.

[24]  Yuming Zhou,et al.  The Influence of Developer Quality on Software Fault-Proneness Prediction , 2014, 2014 Eighth International Conference on Software Security and Reliability.

[25]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[26]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[27]  BabarMuhammad Ali,et al.  Deep Learning for Source Code Modeling and Generation , 2020 .

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Tim Menzies,et al.  Is "Better Data" Better Than "Better Data Miners"? , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[30]  Gabriele Bavota,et al.  A Developer Centered Bug Prediction Model , 2018, IEEE Transactions on Software Engineering.

[31]  Christopher Theisen,et al.  Better together: Comparing vulnerability prediction models , 2020, Inf. Softw. Technol..

[32]  Claudio Bettini,et al.  The Privacy Implications of Cyber Security Systems , 2018, ACM Comput. Surv..

[33]  Ning Li,et al.  Poster: Bridging Effort-Aware Prediction and Strong Classification - A Just-in-Time Software Defect Prediction Study , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[34]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[35]  Akito Monden,et al.  MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction , 2018, IEEE Transactions on Software Engineering.

[36]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[37]  Rainer Koschke,et al.  Effort-Aware Defect Prediction Models , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[38]  Harald C. Gall,et al.  A Search-based Training Algorithm for Cost-aware Defect Prediction , 2016, GECCO.

[39]  Haidar Osman,et al.  An Extensive Analysis of Efficient Bug Prediction Configurations , 2017, PROMISE.

[40]  Xiang Chen,et al.  Software defect number prediction: Unsupervised vs supervised methods , 2019, Inf. Softw. Technol..

[41]  Yuming Zhou,et al.  Empirical analysis of network measures for effort-aware fault-proneness prediction , 2016, Inf. Softw. Technol..

[42]  Hoa Khanh Dam,et al.  Automatic Feature Learning for Predicting Vulnerable Software Components , 2021, IEEE Transactions on Software Engineering.

[43]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[44]  Chakkrit Tantithamthavorn,et al.  Mining Software Defects: Should We Consider Affected Releases? , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[45]  Akito Monden,et al.  An analysis of developer metrics for fault prediction , 2010, PROMISE '10.

[46]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[47]  Yuming Zhou,et al.  Predicting Vulnerable Components via Text Mining or Software Metrics? An Effort-Aware Perspective , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[48]  David Lo,et al.  Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction , 2018, Empirical Software Engineering.

[49]  Jens Grabowski,et al.  A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches , 2018, IEEE Transactions on Software Engineering.

[50]  Hoh Peter In,et al.  Developer Micro Interaction Metrics for Software Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[51]  Laurie A. Williams,et al.  Socio-technical developer networks: should we trust our measurements? , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[52]  E.J. Weyuker,et al.  Using Developer Information as a Factor for Fault Prediction , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[53]  Yuming Zhou,et al.  How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction , 2018, ACM Trans. Softw. Eng. Methodol..

[54]  Alessandro Vespignani,et al.  K-core Decomposition: a Tool for the Visualization of Large Scale Networks , 2005, ArXiv.

[55]  Akito Monden,et al.  Assessing the Cost Effectiveness of Fault Prediction in Acceptance Testing , 2013, IEEE Transactions on Software Engineering.

[56]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[57]  Gabriele Bavota,et al.  On the role of developer's scattered changes in bug prediction , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[58]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.