Feature Subset Selection for Software Cost Modelling and Estimation

Feature selection has been recently used in the area of software engineering for improving the accuracy and robustness of software cost models. The idea behind selecting the most informative subset of features from a pool of available cost drivers stems from the hypothesis that reducing the dimensionality of datasets will significantly minimise the complexity and time required to reach to an estimation using a particular modelling technique. This work investigates the appropriateness of attributes, obtained from empirical project databases and aims to reduce the cost drivers used while preserving performance. Finding suitable subset selections that may cater improved predictions may be considered as a pre-processing step of a particular technique employed for cost estimation (filter or wrapper) or an internal (embedded) step to minimise the fitting error. This paper compares nine relatively popular feature selection methods and uses the empirical values of selected attributes recorded in the ISBSG and Desharnais datasets to estimate software development effort.

[1]  Daniel Port,et al.  Comparative studies of the model evaluation criterions mmre and pred in software cost estimation research , 2008, ESEM '08.

[2]  G. David Garson,et al.  Interpreting neural-network connection weights , 1991 .

[3]  Harris Papadopoulos,et al.  Software Effort Estimation with Ridge Regression and Evolutionary Attribute Selection , 2010, ArXiv.

[4]  Tim Menzies,et al.  Specialization and extrapolation of software cost models , 2005, ASE '05.

[5]  Girish H. Subramanian,et al.  Dimensionality reduction in software development effort estimation , 1993, J. Syst. Softw..

[6]  Yulian Zhu,et al.  Local ridge regression for face recognition , 2009, Neurocomputing.

[7]  Tim Menzies,et al.  Feature subset selection can improve software cost estimation accuracy , 2005, ACM SIGSOFT Softw. Eng. Notes.

[8]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[9]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[10]  Barbara A. Kitchenham,et al.  Effort estimation using analogy , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[11]  Barry W. Boehm,et al.  A constrained regression technique for cocomo calibration , 2008, ESEM '08.

[12]  Martin J. Shepperd,et al.  Search Heuristics, Case-based Reasoning And Software Project Effort Prediction , 2002, GECCO.

[13]  Harris Papadopoulos,et al.  Reliable Confidence Intervals for Software Effort Estimation , 2009, AIAI Workshops.

[14]  Thong Ngee Goh,et al.  Adaptive ridge regression system for software cost estimating on multi-collinear datasets , 2010, J. Syst. Softw..

[15]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[16]  Barry W. Boehm,et al.  Finding the right data for software cost modeling , 2005, IEEE Software.

[17]  Yu-Jen Liu,et al.  A comparative evaluation on the accuracies of software effort estimates from clustered data , 2008, Inf. Softw. Technol..

[18]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[19]  Andreas S. Andreou,et al.  On the Problem of Attribute Selection for Software Cost Estimation: Input Backward Elimination Using Artificial Neural Networks , 2010, AIAI.

[20]  Magne Jørgensen,et al.  The role of outcome feedback in improving the uncertainty assessment of software development effort estimates , 2008, TSEM.

[21]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[22]  D. Ross Jeffery,et al.  Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation , 2008, IEEE Transactions on Software Engineering.

[23]  Alexander Gammerman,et al.  Transduction with Confidence and Credibility , 1999, IJCAI.

[24]  Stephen G. MacDonell,et al.  Applications of fuzzy logic to software metric models for development effort estimation , 1997, 1997 Annual Meeting of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.97TH8297).

[25]  Daniel Neagu,et al.  Improving analogy software effort estimation using fuzzy feature subset selection algorithm , 2008, PROMISE '08.

[26]  Barry W. Boehm,et al.  Software development cost estimation approaches — A survey , 2000, Ann. Softw. Eng..

[27]  Jae Kyu Lee,et al.  Quasi-optimal case-selective neural network model for software effort estimation , 2001, Expert Syst. Appl..

[28]  Sun-Jen Huang,et al.  Optimization of analogy weights by genetic algorithm for software effort estimation , 2006, Inf. Softw. Technol..

[29]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[30]  Ware Myers,et al.  Measures for Excellence: Reliable Software on Time, Within Budget , 1991 .

[31]  J. Hihn,et al.  Column Pruning Beats Stratification in Effort Estimation , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[32]  Richard H Anderson,et al.  Application of ridge regression to quantify marginal effects of collinear soil properties on phytotoxicity of arsenic, cadmium, lead, and zinc. , 2009, Environmental toxicology and chemistry.

[33]  Vadlamani Ravi,et al.  Software development cost estimation using wavelet neural networks , 2008, J. Syst. Softw..

[34]  Ellis Horowitz,et al.  Cocomo ii model definition manual , 1998 .

[35]  Abbas Heiat,et al.  Comparison of artificial neural network and regression models for estimating software development effort , 2002, Inf. Softw. Technol..

[36]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[37]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[38]  Thong Ngee Goh,et al.  A study of mutual information based feature selection for case based reasoning in software cost estimation , 2009, Expert Syst. Appl..

[39]  Ware Myers,et al.  Five Core Metrics: Intelligence behind Successful Software Management , 2003 .

[40]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[41]  Lawrence H. Putnam,et al.  A General Empirical Solution to the Macro Software Sizing and Estimating Problem , 1978, IEEE Transactions on Software Engineering.

[42]  Lionel C. Briand,et al.  Resource modeling in software engineering , 2002 .

[43]  Emilia Mendes,et al.  Why comparative effort prediction studies may be invalid , 2009, PROMISE '09.

[44]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[45]  Stefan Biffl,et al.  Optimal project feature weights in analogy-based cost estimation: improvement and limitations , 2006, IEEE Transactions on Software Engineering.

[46]  Kjetil Moløkken-Østvold,et al.  A review of software surveys on software effort estimation , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[47]  Haris Haralambous,et al.  Reliable Predictive Intervals for the Critical Frequency of the F2 Ionospheric Layer , 2010, ECAI.

[48]  Miguel A. Carreira-Perpinan,et al.  Dimensionality Reduction , 2011 .