An Empirical Study of Feature Ranking Techniques for Software Quality Prediction

The primary goal of software quality engineering is to produce a high quality software product through the use of some specific techniques and processes. One strategy is applying data mining techniques to software metric and defect data collected during the software development process to identify potential low-quality program modules. In this paper, we investigate the use of feature selection in the context of software quality estimation (also referred to as software defect prediction), where a classification model is used to predict whether program modules (instances) are fault-prone or not-fault-prone. Seven filter-based feature ranking techniques are examined. Among them, six are commonly used, and the other one, named signal to noise ratio (SNR), is rarely employed. The objective of the paper is to compare these seven techniques for various software data sets and assess their effectiveness for software quality modeling. A case study is performed on 16 software data sets, and classification models are...

[1]  Chi-Chun Huang,et al.  A Novel GA-Taguchi-Based Feature Selection Method , 2008, IDEAL.

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[4]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[5]  Elif Derya Übeyli,et al.  Feature saliency using signal-to-noise ratios in automated diagnostic systems developed for Doppler ultrasound signals , 2006, Eng. Appl. Artif. Intell..

[6]  Taghi M. Khoshgoftaar,et al.  Feature Selection with Imbalanced Data for Software Defect Prediction , 2009, 2009 International Conference on Machine Learning and Applications.

[7]  Taghi M. Khoshgoftaar,et al.  Comparative Analysis of DNA Microarray Data through the Use of Feature Selection Techniques , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[8]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[9]  Liang Goh,et al.  A Hybrid Feature Selection Approach for Microarray Gene Expression Data , 2006, International Conference on Computational Science.

[10]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[11]  Taghi M. Khoshgoftaar,et al.  Mining Data from Multiple Software Development Projects , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[12]  Barry W. Boehm,et al.  Finding the right data for software cost modeling , 2005, IEEE Software.

[13]  Stan Matwin,et al.  STochFS: A Framework for Combining Feature Selection Outcomes Through a Stochastic Process , 2005, PKDD.

[14]  Taghi M. Khoshgoftaar,et al.  Choosing software metrics for defect prediction: an investigation on feature selection techniques , 2011, Softw. Pract. Exp..

[15]  Saswati Mukherjee,et al.  An Improved Feature Selection using Maximized Signal to Noise Ratio Technique for TC , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[16]  Taghi M. Khoshgoftaar,et al.  An Empirical Study of Learning from Imbalanced Data Using Random Forest , 2007 .

[17]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[18]  Hongfang Liu,et al.  An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules , 2009, IEEE Transactions on Software Engineering.

[19]  R. Mlynarski,et al.  New feature selection methods for qualification of the patients for cardiac pacemaker implantation , 2007, 2007 Computers in Cardiology.

[20]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[22]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[23]  Witold Pedrycz,et al.  Effective classification using feature selection and fuzzy integration , 2008, Fuzzy Sets Syst..

[24]  Jesús S. Aguilar-Ruiz,et al.  Detecting Fault Modules Applying Feature Selection to Classifiers , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[25]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[26]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[27]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[28]  Yue Jiang,et al.  Variance Analysis in Software Fault Prediction Models , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[29]  Tian-Yu Liu,et al.  EasyEnsemble and Feature Selection for Imbalance Data Sets , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[30]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[31]  Taghi M. Khoshgoftaar,et al.  ATTRIBUTE SELECTION USING ROUGH SETS IN SOFTWARE QUALITY CLASSIFICATION , 2009 .

[32]  Jesús S. Aguilar-Ruiz,et al.  Attribute Selection in Software Engineering Datasets for Detecting Fault Modules , 2007, 33rd EUROMICRO Conference on Software Engineering and Advanced Applications (EUROMICRO 2007).

[33]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .