Software Effort Estimation with Ridge Regression and Evolutionary Attribute Selection

Software cost estimation is one of the prerequisite managerial activities carried out at the software development initiation stages and also repeated throughout the whole software life-cycle so that amendments to the total cost are made. In software cost estimation typically, a selection of project attributes is employed to produce effort estimations of the expected human resources to deliver a software product. However, choosing the appropriate project cost drivers in each case requires a lot of experience and knowledge on behalf of the project manager which can only be obtained through years of software engineering practice. A number of studies indicate that popular methods applied in the literature for software cost estimation, such as linear regression, are not robust enough and do not yield accurate predictions. Recently the dual variables Ridge Regression (RR) technique has been used for effort estimation yielding promising results. In this work we show that results may be further improved if an AI method is used to automatically select appropriate project cost drivers (inputs) for the technique. We propose a hybrid approach combining RR with a Genetic Algorithm, the latter evolving the subset of attributes for approximating effort more accurately. The proposed hybrid cost model has been applied on a widely known high-dimensional dataset of software project samples and the results obtained show that accuracy may be increased if redundant attributes are eliminated.

[1]  Taghi M. Khoshgoftaar,et al.  Can neural networks be easily interpreted in software cost estimation? , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[2]  Stephen G. MacDonell,et al.  Applications of fuzzy logic to software metric models for development effort estimation , 1997, 1997 Annual Meeting of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.97TH8297).

[3]  Daniel Neagu,et al.  Improving analogy software effort estimation using fuzzy feature subset selection algorithm , 2008, PROMISE '08.

[4]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[5]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[6]  José Javier Dolado,et al.  On the problem of the software cost function , 2001, Inf. Softw. Technol..

[7]  Jae Kyu Lee,et al.  Quasi-optimal case-selective neural network model for software effort estimation , 2001, Expert Syst. Appl..

[8]  Haris Haralambous,et al.  Reliable Predictive Intervals for the Critical Frequency of the F2 Ionospheric Layer , 2010, ECAI.

[9]  Yulian Zhu,et al.  Local ridge regression for face recognition , 2009, Neurocomputing.

[10]  Stephen G. MacDonell,et al.  Combining techniques to optimize effort predictions in software project management , 2003, J. Syst. Softw..

[11]  Alexander Gammerman,et al.  Transduction with Confidence and Credibility , 1999, IJCAI.

[12]  Saeed Parsa,et al.  Finding Causes of Software Failure Using Ridge Regression and Association Rule Generation Methods , 2008, 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing.

[13]  Colin J Burgess,et al.  Can genetic programming improve software effort estimation? A comparative evaluation , 2001, Inf. Softw. Technol..

[14]  Harris Papadopoulos,et al.  Reliable Confidence Intervals for Software Effort Estimation , 2009, AIAI Workshops.

[15]  Thong Ngee Goh,et al.  Adaptive ridge regression system for software cost estimating on multi-collinear datasets , 2010, J. Syst. Softw..

[16]  Barry W. Boehm,et al.  A constrained regression technique for cocomo calibration , 2008, ESEM '08.

[17]  Barry W. Boehm,et al.  Finding the right data for software cost modeling , 2005, IEEE Software.

[18]  Taghi M. Khoshgoftaar,et al.  Identification of fuzzy models of software cost estimation , 2004, Fuzzy Sets Syst..

[19]  Nasser Tadayon Neural network approach for software cost estimation , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[20]  Sun-Jen Huang,et al.  Optimization of analogy weights by genetic algorithm for software effort estimation , 2006, Inf. Softw. Technol..

[21]  Richard H Anderson,et al.  Application of ridge regression to quantify marginal effects of collinear soil properties on phytotoxicity of arsenic, cadmium, lead, and zinc , 2009, Environmental toxicology and chemistry.