A Genetic Algorithm to Configure Support Vector Machines for Predicting Fault-Prone Components

In some studies, Support Vector Machines (SVMs) have been turned out to be promising for predicting fault-prone software components. Nevertheless, the performance of the method depends on the setting of some parameters. To address this issue, we propose the use of a Genetic Algorithm (GA) to search for a suitable configuration of SVMs parameters that allows us to obtain optimal prediction performance. The approach has been assessed carrying out an empirical analysis based on jEdit data from the PROMISE repository. We analyzed both the inter- and the intra-release performance of the proposed method. As benchmarks we exploited SVMs with Grid-search and several other machine learning techniques. The results show that the proposed approach let us to obtain an improvement of the performance with an increasing of the Recall measure without worsening the Precision one. This behavior was especially remarkable for the inter-release use with respect to the other prediction techniques.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  B. Kitchenham,et al.  Case Studies for Method and Tool Evaluation , 1995, IEEE Softw..

[3]  Mark Harman,et al.  Search-based software engineering , 2001, Inf. Softw. Technol..

[4]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[5]  James H. Garrett,et al.  Engineering applications of neural networks , 1993, J. Intell. Manuf..

[6]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[7]  Bruce Christianson,et al.  Using the Support Vector Machine as a Classification Method for Software Defect Prediction with Static Code Metrics , 2009, EANN.

[8]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[9]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[10]  Ping Guo,et al.  Software Defect Prediction Using Fuzzy Support Vector Regression , 2010, ISNN.

[11]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[12]  John A. Clark,et al.  Metrics are fitness functions too , 2004 .

[13]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Haruhiko Kaiya,et al.  Adapting a fault prediction model to allow inter languagereuse , 2008, PROMISE '08.

[16]  Arvinder Kaur,et al.  Application of support vector machine to predict fault prone classes , 2009, SOEN.

[17]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[18]  Arvinder Kaur,et al.  Software Fault Proneness Prediction Using Support Vector Machines , 2009 .

[19]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[20]  Elaine J. Weyuker,et al.  How to measure success of fault prediction models , 2007, SOQUA '07.

[21]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[22]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[23]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[24]  A. Kaur,et al.  Application of Random Forest in Predicting Fault-Prone Classes , 2008, 2008 International Conference on Advanced Computer Theory and Engineering.

[25]  Filomena Ferrucci,et al.  Genetic Programming for Effort Estimation: An Analysis of the Impact of Different Fitness Functions , 2010, 2nd International Symposium on Search Based Software Engineering.

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  Eivind Berg Johannessen Data mining techniques, candidate measures and evaluation methods for building practically useful fault-proneness prediction models , 2008 .

[28]  Parvinder S. Sandhu,et al.  A Genetic Algorithm Based Classification Approach for Finding Fault Prone Classes , 2009 .

[29]  Fred W. Glover,et al.  Tabu Search , 1997, Handbook of Heuristics.

[30]  Iker Gondra,et al.  Applying machine learning to software fault-proneness prediction , 2008, J. Syst. Softw..

[31]  Emilia Mendes,et al.  How effective is Tabu search to configure support vector regression for effort estimation? , 2010, PROMISE '10.

[32]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..