Genetic Programming for Effort Estimation: An Analysis of the Impact of Different Fitness Functions

Context: The use of search-based methods has been recently proposed for software development effort estimation and some case studies have been carried out to assess the effectiveness of Genetic Programming (GP). The results reported in the literature showed that GP can provide an estimation accuracy comparable or slightly better than some widely used techniques and encouraged further research to investigate whether varying the fitness function the estimation accuracy can be improved. Aim: Starting from these considerations, in this paper we report on a case study aiming to analyse the role played by some fitness functions for the accuracy of the estimates. Method: We performed a case study based on a publicly available dataset, i.e., Desharnais, by applying a 3-fold cross validation and employing summary measures and statistical tests for the analysis of the results. Moreover, we compared the accuracy of the obtained estimates with those achieved using some widely used estimation methods, namely Case-Based Reasoning (CBR) and Manual Step Wise Regression (MSWR). Results: The obtained results highlight that the fitness function choice significantly affected the estimation accuracy. The results also revealed that GP provided significantly better estimates than CBR and comparable with those of MSWR for the considered dataset.

[1]  Stephanie Forrest,et al.  Proceedings of the 5th International Conference on Genetic Algorithms , 1993 .

[2]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[3]  Lionel C. Briand,et al.  An assessment and comparison of common software cost estimation modeling techniques , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[4]  Jean-Marc Desharnais,et al.  A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models , 1997, J. Syst. Softw..

[5]  Barbara Kitchenham,et al.  A comparison of cross-company and within-company effort estimation models for Web applications , 2004, ICSE 2004.

[6]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[7]  Martin J. Shepperd,et al.  Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets , 2003, GECCO.

[8]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[9]  J. Royston An Extension of Shapiro and Wilk's W Test for Normality to Large Samples , 1982 .

[10]  Emilia Mendes,et al.  Further comparison of cross-company and within-company effort estimation models for Web applications , 2004 .

[11]  Mark Harman,et al.  The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).

[12]  B. Kitchenham,et al.  Case Studies for Method and Tool Evaluation , 1995, IEEE Softw..

[13]  Isabella Wieczorek,et al.  Resource Estimation in Software Engineering , 2002 .

[14]  Barbara A. Kitchenham,et al.  Effort estimation using analogy , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[15]  Mark Harman,et al.  Search-based software engineering , 2001, Inf. Softw. Technol..

[16]  John J. Grefenstette,et al.  Genetic Algorithms for Tracking Changing Environments , 1993, ICGA.

[17]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[18]  Lionel C. Briand,et al.  Modeling Development Effort in Object-Oriented Systems Using Design Properties , 2001, IEEE Trans. Software Eng..

[19]  José Javier Dolado,et al.  A Validation of the Component-Based Method for Software Size Estimation , 2000, IEEE Trans. Software Eng..

[20]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[21]  Ingunn Myrtveit,et al.  Human performance estimating with analogy and regression models: an empirical validation , 1998, Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262).

[22]  Sun-Jen Huang,et al.  Optimization of analogy weights by genetic algorithm for software effort estimation , 2006, Inf. Softw. Technol..

[23]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[24]  Vahid Garousi,et al.  Empirical analysis of a genetic algorithm-based stress test technique , 2008, GECCO '08.

[25]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[26]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[27]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[28]  Colin J Burgess,et al.  Can genetic programming improve software effort estimation? A comparative evaluation , 2001, Inf. Softw. Technol..

[29]  Daryl Essam,et al.  Software project effort estimation using genetic programming , 2002, IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions.

[30]  Mark Harman,et al.  Reformulating software engineering as a search problem , 2003 .

[31]  Filomena Ferrucci,et al.  Using Tabu Search to Estimate Software Development Effort , 2009, IWSM/Mensura.

[32]  Martin Shepperd,et al.  Using Simulation to Evaluate Prediction Techniques , 2001 .