An empirical evaluation of NASA-MDP data sets using a genetic defect-proneness prediction framework

In software engineering, software quality is an important research area. Automated generation of learning schemes plays an important role and represents an efficient way to detect defects in software projects, thus avoiding high costs and long delivery times. This study carries out an empirical evaluation to validate two versions with different levels of noise of NASA-MDP data sets. The main objective of this paper is to determine the stability of our framework. In all, 864 learning schemes were studied (8 data preprocessors × 6 attribute selectors × 18 learning algorithms). In line with statistical tests, our framework reported stable results between the analyzed versions. Results reported that evaluation and prediction phases were similar. Furthermore, the performance of the phases of evaluation and prediction between versions of data sets were stable. This means that the differences between versions did not affect the performance of our framework.

[1]  Amri Napolitano,et al.  Software measurement data reduction using ensemble techniques , 2012, Neurocomputing.

[2]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[3]  Vili Podgorelec,et al.  Improved mining of software complexity data on evolutionary filtered training sets , 2009 .

[4]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[5]  Xiuzhen Zhang,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..

[6]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[7]  Ruchika Malhotra,et al.  Comparative analysis of statistical and machine learning methods for predicting faulty modules , 2014, Appl. Soft Comput..

[8]  Rubén Fuentes-Fernández,et al.  An Empirical Validation of Learning Schemes Using an Automated Genetic Defect Prediction Framework , 2016, IBERAMIA.

[9]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[10]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[11]  Christian Quesada-López,et al.  Software Fault Prediction: A Systematic Mapping Study , 2015, CIbSE.

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  P. Singh,et al.  Empirical investigation of fault prediction capability of object oriented metrics of open source software , 2012, 2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE).

[14]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[15]  Marcelo Jenkins,et al.  A Software Defect-Proneness Prediction Framework: A new approach using genetic algorithms to generate learning schemes , 2015, SEKE.

[16]  V. Basili Software modeling and measurement: the Goal/Question/Metric paradigm , 1992 .