Improving effort estimation of Fuzzy Analogy using feature subset selection

Feature selection has been recently used in the area of software development effort estimation for improving the accuracy and robustness of prediction techniques. The idea behind selecting the most informative subset of features from a pool of available effort drivers stems from the hypothesis that reducing the dimensionality of datasets may significantly minimize the complexity and time required to reach to an optimal and accurate estimation. This paper compares two relatively popular feature selection techniques (Forward Subset Selection and Backward Feature Elimination) used with Fuzzy Analogy for software effort estimation. This empirical comparison is done over eight well-known datasets with the Jackknife evaluation method. The results suggest that Fuzzy Analogy using feature subset selection generates more accurate estimates in terms of the Standardized Accuracy (SA) and Pred(p) criteria than Fuzzy Analogy without using feature subset selection regardless of the data set used. Moreover, this study found that the use of Forward Feature Selection, rather than Backward Feature Elimination, may improve the prediction accuracy of Fuzzy Analogy and reduce the number of features selected.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  James C. Bezdek,et al.  A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  D. Ross Jeffery,et al.  Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation , 2008, IEEE Transactions on Software Engineering.

[6]  Lionel C. Briand,et al.  Resource modeling in software engineering , 2002 .

[7]  Min Xie,et al.  An empirical analysis of data preprocessing for machine learning-based software cost estimation , 2015, Inf. Softw. Technol..

[8]  Francisco Herrera,et al.  A taxonomy for the crossover operator for real‐coded genetic algorithms: An experimental study , 2003, Int. J. Intell. Syst..

[9]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[10]  Alain Abran,et al.  A fuzzy logic based set of measures for software project similarity: validation and possible improvements , 2001, Proceedings Seventh International Software Metrics Symposium.

[11]  Thong Ngee Goh,et al.  A study of mutual information based feature selection for case based reasoning in software cost estimation , 2009, Expert Syst. Appl..

[12]  Erkki Oja,et al.  A "nonnegative PCA" algorithm for independent component analysis , 2004, IEEE Transactions on Neural Networks.

[13]  Martin J. Shepperd,et al.  Search Heuristics, Case-based Reasoning And Software Project Effort Prediction , 2002, GECCO.

[14]  Heinz Mühlenbein,et al.  Predictive Models for the Breeder Genetic Algorithm I. Continuous Parameter Optimization , 1993, Evolutionary Computation.

[15]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[16]  Barry W. Boehm,et al.  Feature subset selection can improve software cost estimation accuracy , 2005, ACM SIGSOFT Softw. Eng. Notes.

[17]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[18]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Alain Abran,et al.  Investigating soft computing in case-based reasoning for software cost estimation , 2002 .

[20]  Chin-Yu Huang,et al.  Comparison of weighted grey relational analysis for software effort estimation , 2011, Software Quality Journal.

[21]  Harris Papadopoulos,et al.  Feature Subset Selection for Software Cost Modelling and Estimation , 2012, ArXiv.

[22]  Alain Abran,et al.  Analogy-based software development effort estimation: A systematic mapping and review , 2015, Inf. Softw. Technol..

[23]  Kjetil Moløkken-Østvold,et al.  A review of software surveys on software effort estimation , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[24]  Alain Abran,et al.  Generating fuzzy term sets for software project attributes by using fuzzy C-means and real coded genetic algorithms , 2006 .

[25]  A. Abran,et al.  An Experiment on the Design of Radial Basis Function Neural Networks for Software Cost Estimation , 2006, 2006 2nd International Conference on Information & Communication Technologies.

[26]  Ayse Basar Bener,et al.  Exploiting the Essential Assumptions of Analogy-Based Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[27]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[28]  J. Hihn,et al.  Column Pruning Beats Stratification in Effort Estimation , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[29]  Tommy W. S. Chow,et al.  Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.

[30]  Daniel Neagu,et al.  Improving analogy software effort estimation using fuzzy feature subset selection algorithm , 2008, PROMISE '08.

[31]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[32]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..

[33]  Deniz Erdogmus,et al.  Feature selection in MLPs and SVMs based on maximum output information , 2004, IEEE Transactions on Neural Networks.

[34]  Alain Abran,et al.  Improving Fuzzy Analogy Based Software Development Effort Estimation , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[35]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[36]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[37]  G. Sullivan Natural and artificial low-level seeing systems - Visual interpretation of known objects in constrained scenes , 1992 .

[38]  Tim Menzies,et al.  Feature subset selection can improve software cost estimation accuracy , 2005, ACM SIGSOFT Softw. Eng. Notes.

[39]  Isabella Wieczorek,et al.  Resource Estimation in Software Engineering , 2002 .

[40]  Barry W. Boehm,et al.  A constrained regression technique for cocomo calibration , 2008, ESEM '08.

[41]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[42]  Daniel Port,et al.  Comparative studies of the model evaluation criterions mmre and pred in software cost estimation research , 2008, ESEM '08.

[43]  Ioannis Stamelos,et al.  Estimating the development cost of custom software , 2003, Inf. Manag..

[44]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[45]  Elena García Barriocanal,et al.  Software cost estimation with fuzzy inputs: Fuzzy modelling and aggregation of cost drivers , 2005, Kybernetika.

[46]  Mohammad Azzeh A replicated assessment and comparison of adaptation techniques for analogy-based effort estimation , 2011, Empirical Software Engineering.

[47]  Thong Ngee Goh,et al.  Adaptive ridge regression system for software cost estimating on multi-collinear datasets , 2010, J. Syst. Softw..

[48]  XieMin,et al.  An empirical analysis of data preprocessing for machine learning-based software cost estimation , 2015 .

[49]  Tim Menzies,et al.  Specialization and extrapolation of software cost models , 2005, ASE '05.

[50]  Girish H. Subramanian,et al.  Dimensionality reduction in software development effort estimation , 1993, J. Syst. Softw..