What should you optimize when building an estimation model?

When estimation models are derived from existing data, they are commonly evaluated using statistics such as mean magnitude of relative error. But when the models are derived in the first place, it is usually by optimizing something else - typically, as in statistical regression, by minimizing the sum of squared deviations. How do estimation models for typical software engineering data fare, on various common accuracy statistics, if they are derived using other "fitness functions"? In this study, estimation models are built using a variety of fitness functions, and evaluated using a wide range of accuracy statistics. We find that models based on minimizing actual errors generally out-perform models based on minimizing relative errors. Given the nature of software engineering data sets, minimizing the sum of absolute deviations seems an effective compromise

[1]  Martin J. Shepperd,et al.  Using simulation to evaluate prediction techniques [for software] , 2001, Proceedings Seventh International Software Metrics Symposium.

[2]  Natalia Juristo Juzgado,et al.  Basics of Software Engineering Experimentation , 2010, Springer US.

[3]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[4]  Martin J. Shepperd,et al.  Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets , 2003, GECCO.

[5]  Qinbao Song,et al.  Dealing with missing software project data , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[6]  Barbara A. Kitchenham,et al.  Using simulated data sets to compare data analysis techniques used for software cost modelling , 2001, IEE Proc. Softw..

[7]  Colin J Burgess,et al.  Can genetic programming improve software effort estimation? A comparative evaluation , 2001, Inf. Softw. Technol..

[8]  José Javier Dolado,et al.  On the problem of the software cost function , 2001, Inf. Softw. Technol..

[9]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[10]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[11]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[12]  Daryl Essam,et al.  Software project effort estimation using genetic programming , 2002, IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions.

[13]  Y. Miyazaki,et al.  Robust regression for developing software estimation models , 1994, J. Syst. Softw..

[14]  Martin Shepperd,et al.  Using Simulation to Evaluate Prediction Techniques , 2001 .

[15]  Joyce Snell,et al.  6. Alternative Methods of Regression , 1996 .

[16]  Martin J. Shepperd,et al.  Making inferences with small numbers of training sets , 2002, IEE Proc. Softw..

[17]  David Birkes,et al.  Alternative Methods of Regression: Birkes/Alternative , 1993 .

[18]  Peter A. Whigham,et al.  Grammatically-based Genetic Programming , 1995 .

[19]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .