Increasing the Prediction Quality of Software Defective Modules with Automatic Feature Engineering

This paper reviews the main concepts related to software testing, its difficulties and the impossibility of a complete software test. Then, it proposes an approach to predict which module is defective, aiming to assure the usually limited software test resources will be wisely distributed to maximize the coverage of the modules most prone to defects. The used approach employs the recently proposed Kaizen Programming (KP) to automatically discover high-quality nonlinear combinations of the original features of a database to be used by the classification technique, replacing a human in the feature engineering process. Using a NASA open dataset with Software metrics of over 9500 modules, the experimental analysis shows that the new features can significantly boost the detection of detective modules, allowing testers to find 216% more defects than with a random module selection; this is also an improvement of 1% when compared to the original features.

[1]  Vinicius Veloso de Melo,et al.  Breast cancer detection with logistic regression improved by features constructed by Kaizen programming in a hybrid approach , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[2]  Vinicius Veloso de Melo,et al.  Solving the Lawn Mower problem with Kaizen Programming and λ-Linear Genetic Programming for Module Acquisition , 2016, GECCO.

[3]  Maurice H. Halstead,et al.  Toward a theoretical basis for estimating programming effort , 1975, ACM '75.

[4]  Santanu Kumar Rath,et al.  An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes , 2017, Comput. Stand. Interfaces.

[5]  Charles W. Butler,et al.  Design complexity measurement and testing , 1989, CACM.

[6]  Masaaki Imai,et al.  Kaizen (Ky'zen) : the key to Japan's competitive success / Masaaki Imai , 1986 .

[7]  G. Myers,et al.  The Art of Software Testing: Myers/Art , 2012 .

[8]  Hisham M. Haddad,et al.  The State of Metrics in Software Industry , 2008, Fifth International Conference on Information Technology: New Generations (itng 2008).

[9]  Xiuzhen Zhang,et al.  Predicting Defective Software Components from Code Complexity Measures , 2007 .

[10]  Vinicius Veloso de Melo,et al.  Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid , 2018, Inf. Sci..

[11]  Liang Tian,et al.  Evolutionary neural network modeling for software cumulative failure time prediction , 2005, Reliab. Eng. Syst. Saf..

[12]  Vinicius Veloso de Melo,et al.  Improving the prediction of material properties of concrete using Kaizen Programming with Simulated Annealing , 2017, Neurocomputing.

[13]  Md Zahidul Islam,et al.  Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem , 2015, Inf. Syst..

[14]  Mian M. Awais,et al.  Improving Recall of software defect prediction models using association mining , 2015, Knowl. Based Syst..

[15]  Norman P. Bresky,et al.  Tools and Methods for the Improvement of Quality , 1990 .

[16]  W. Banzhaf,et al.  Improving Logistic Regression Classification of Credit Approval with Features Constructed by Kaizen Programming , 2016, GECCO.

[17]  Vinicius Veloso de Melo,et al.  Kaizen programming , 2014, GECCO.

[18]  Bart Broekman,et al.  Testing Embedded Software , 2002 .

[19]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[20]  Danielle Azar,et al.  A PSO-GA approach targeting fault-prone software modules , 2017, J. Syst. Softw..

[21]  Vinicius Veloso de Melo,et al.  Kaizen Programming for Feature Construction for Classification , 2016 .

[22]  Venkata U. B. Challagulla,et al.  Empirical assessment of machine learning based software defect prediction techniques , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[23]  Sandeep Kumar,et al.  Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems , 2017, Knowl. Based Syst..

[24]  Boris Beizer,et al.  Software Testing Techniques , 1983 .

[25]  Tim Menzies,et al.  Assessing Predictors of Software Defects , 2004 .

[26]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[27]  Richard Torkar,et al.  Software fault prediction metrics: A systematic literature review , 2013, Inf. Softw. Technol..

[28]  Tracy Hall,et al.  Software defect prediction: do different classifiers find the same defects? , 2017, Software Quality Journal.

[29]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[30]  Chih-Ping Chu,et al.  Integrating in-process software defect prediction with association mining to discover defect pattern , 2009, Inf. Softw. Technol..

[31]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[32]  Tim Menzies,et al.  How good is your blind spot sampling policy , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[33]  Tihana Galinac Grbac,et al.  Co-evolutionary multi-population genetic programming for classification in software defect prediction: An empirical case study , 2017, Appl. Soft Comput..

[34]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[35]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[36]  Yuming Zhou,et al.  Predicting object-oriented software maintainability using multivariate adaptive regression splines , 2007, J. Syst. Softw..

[37]  John E. Gaffney,et al.  Metrics in software quality assurance , 1981, ACM '81.

[38]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[39]  Vinicius Veloso de Melo,et al.  Classification of Cardiac Arrhythmia by Random Forests with Features Constructed by Kaizen Programming with Linear Genetic Programming , 2016, GECCO.

[40]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[41]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[42]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[43]  Bruce Christianson,et al.  The misuse of the NASA metrics data program data sets for automated software defect prediction , 2011, EASE.

[44]  Mengqi Wu,et al.  Effective Software Test Automation: Developing an Automated Software Testing Tool , 2004 .

[45]  Marnie L. Hutcheson,et al.  Software testing fundamentals - methods and metrics , 2003 .

[46]  Hassan Reza,et al.  A Model Based Testing Technique to Test Web Applications Using Statecharts , 2008, Fifth International Conference on Information Technology: New Generations (itng 2008).

[47]  Boris Beizer RETRACTED ARTICLE: Software is different , 2000, Ann. Softw. Eng..

[48]  Sandeep Kumar,et al.  Towards an ensemble based system for predicting the number of software faults , 2017, Expert Syst. Appl..

[49]  Sajjan G. Shiva,et al.  Software Reuse: Research and Practice , 2007, Fourth International Conference on Information Technology (ITNG'07).

[50]  Jian Li,et al.  Software Defect Prediction via Convolutional Neural Network , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[51]  Per Runeson,et al.  A Second Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems , 2007, IEEE Transactions on Software Engineering.

[52]  Prabhat Ranjan,et al.  Software Fault Prediction using Computational Intelligence Techniques: A Survey , 2017 .

[53]  José Javier Dolado,et al.  Bayesian concepts in software testing: an initial review , 2015, A-TEST@SIGSOFT FSE.

[54]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[55]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..