Reducing Features to Improve Code Change-Based Bug Prediction

Machine learning classifiers have recently emerged as a way to predict the introduction of bugs in changes made to source code files. The classifier is first trained on software history, and then used to predict if an impending change causes a bug. Drawbacks of existing classifier-based bug prediction techniques are insufficient performance for practical use and slow prediction times due to a large number of machine learned features. This paper investigates multiple feature selection techniques that are generally applicable to classification-based bug prediction methods. The techniques discard less important features until optimal classification performance is reached. The total number of features used for training is substantially reduced, often to less than 10 percent of the original. The performance of Naive Bayes and Support Vector Machine (SVM) classifiers when using this technique is characterized on 11 software projects. Naive Bayes using feature selection provides significant improvement in buggy F-measure (21 percent improvement) over prior change classification bug prediction results (by the second and fourth authors [28]). The SVM's improvement in buggy F-measure is 9 percent. Interestingly, an analysis of performance for varying numbers of features shows that strong performance is achieved at even 1 percent of the original number of features.

[1]  Lerina Aversano,et al.  Learning from bug-introducing changes to prevent fault prone code , 2007, IWPSE '07.

[2]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[3]  Dawson R. Engler,et al.  A few billion lines of code later , 2010, Commun. ACM.

[4]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[5]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[6]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[7]  Philip Ball,et al.  The missing links , 2000 .

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[10]  Taghi M. Khoshgoftaar,et al.  Choosing software metrics for defect prediction: an investigation on feature selection techniques , 2011, Softw. Pract. Exp..

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[13]  Thomas Zimmermann,et al.  Preprocessing CVS Data for Fine-Grained Analysis , 2004, MSR.

[14]  Taghi M. Khoshgoftaar,et al.  Predicting the order of fault-prone modules in legacy software , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[15]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[16]  Michael W. Godfrey,et al.  Facilitating software evolution research with kenyon , 2005, ESEC/FSE-13.

[17]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[18]  Taghi M. Khoshgoftaar,et al.  Ordering Fault-Prone Software Modules , 2003, Software Quality Journal.

[19]  Andrei Z. Broder,et al.  Effective and efficient classification on a search-engine model , 2007, Knowledge and Information Systems.

[20]  Stan Matwin,et al.  Feature Engineering for Text Classification , 1999, ICML.

[21]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[22]  Osamu Mizuno,et al.  An extension of fault-prone filtering using precise training and a dynamic threshold , 2008, MSR '08.

[23]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[24]  Ahmed E. Hassan,et al.  Understanding the impact of code and process metrics on post-release defects: a case study on the Eclipse project , 2010, ESEM '10.

[25]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[26]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.

[27]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[28]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[29]  Z. Birnbaum,et al.  One-Sided Confidence Contours for Probability Distribution Functions , 1951 .

[30]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[31]  E. James Whitehead,et al.  Predicting buggy changes inside an integrated development environment , 2007, eclipse '07.

[32]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[33]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[34]  Sandro Morasca,et al.  A hybrid approach to analyze empirical software engineering data and its application to predict module fault-proneness in maintenance , 2000, J. Syst. Softw..

[35]  Sunghun Kim,et al.  Reducing Features to Improve Bug Prediction , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[36]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[37]  Gilbert Chin Fair and Balanced , 2011 .

[38]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[39]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[40]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[41]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[42]  Miryung Kim,et al.  Validity concerns in software engineering research , 2010, FoSER '10.

[43]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[44]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[45]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[46]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[47]  Venkata U. B. Challagulla,et al.  Empirical assessment of machine learning based software defect prediction techniques , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[48]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[49]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[50]  Lionel C. Briand,et al.  Investigating quality factors in object-oriented designs: an industrial case study , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[51]  Richard C. Holt,et al.  The top ten list: dynamic fault prediction , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[52]  Lipika Dey,et al.  A feature selection technique for classificatory analysis , 2005, Pattern Recognit. Lett..

[53]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[54]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[55]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.