Why Software Effort Estimation Needs SBSE

Industrial practitioners now face a bewildering array of possible configurations for effort estimation. How to select the best one for a particular dataset? This paper introduces OIL (short for optimized learning), a novel configuration tool for effort estimation based on differential evolution. When tested on 945 software projects, OIL significantly improved effort estimations, after exploring just a few configurations (just a few dozen). Further OIL's results are far better than two methods in widespread use: estimation-via-analogy and a recent state-of-the-art baseline published at TOSEM'15 by Whigham et al. Given that the computational cost of this approach is so low, and the observed improvements are so large, we conclude that SBSE should be a standard component of software effort estimation.

[1]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[2]  Markus Wagner,et al.  Data-Driven Search-Based Software Engineering , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[3]  Xin Yao,et al.  journal homepage: www.elsevier.com/locate/infsof Ensembles and locality: Insight on improving software effort estimation , 2022 .

[4]  Daniel Ryan Baker,et al.  A Hybrid Approach to Expert and Model Based Effort Estimation , 2007 .

[5]  Daryl Essam,et al.  Software project effort estimation using genetic programming , 2002, IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions.

[6]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[7]  Mark Harman,et al.  Exact Mean Absolute Error of Baseline Predictor, MARP0 , 2016, Inf. Softw. Technol..

[8]  Marcel Korte,et al.  Confidence in software cost estimation results based on MMRE and PRED , 2008, PROMISE '08.

[9]  Magne Jørgensen,et al.  The Impact of Lessons-Learned Sessions on Effort Estimation and Uncertainty Assessments , 2009, IEEE Transactions on Software Engineering.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  José Javier Dolado,et al.  A Validation of the Component-Based Method for Software Size Estimation , 2000, IEEE Trans. Software Eng..

[12]  Martin J. Shepperd,et al.  Software project economics: a roadmap , 2007, Future of Software Engineering (FOSE '07).

[13]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[14]  Filomena Ferrucci,et al.  Genetic Programming for Effort Estimation: An Analysis of the Impact of Different Fitness Functions , 2010, 2nd International Symposium on Search Based Software Engineering.

[15]  Tim Menzies,et al.  A Deep Learning Model for Estimating Story Points , 2016, IEEE Transactions on Software Engineering.

[16]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..

[17]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[18]  Magne Jørgensen,et al.  A review of studies on expert estimation of software development effort , 2004, J. Syst. Softw..

[19]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[20]  Ayse Basar Bener,et al.  Exploiting the Essential Assumptions of Analogy-Based Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[21]  Emilia Mendes,et al.  Investigating Tabu Search for Web Effort Estimation , 2010, 2010 36th EUROMICRO Conference on Software Engineering and Advanced Applications.

[22]  Mark Harman,et al.  Adaptive Multi-Objective Evolutionary Algorithms for Overtime Planning in Software Projects , 2017, IEEE Transactions on Software Engineering.

[23]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[24]  Xin Yao,et al.  A principled evaluation of ensembles of learning machines for software effort estimation , 2011, Promise '11.

[25]  Mark Harman,et al.  Multi-objective Software Effort Estimation , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[26]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[27]  Lucas Layman,et al.  LACE2: Better Privacy-Preserving Data Sharing for Cross Project Defect Prediction , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[28]  R. Haupt Optimum population size and mutation rate for a simple real genetic algorithm that optimizes array factors , 2000, IEEE Antennas and Propagation Society International Symposium. Transmitting Waves of Progress to the Next Millennium. 2000 Digest. Held in conjunction with: USNC/URSI National Radio Science Meeting (C.

[29]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[30]  Ioannis Stamelos,et al.  A Simulation Tool for Efficient Analogy Based Cost Estimation , 2000, Empirical Software Engineering.

[31]  Xin Yao,et al.  An analysis of multi-objective evolutionary algorithms for training ensemble models based on different performance measures in software effort estimation , 2013, PROMISE.

[32]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[33]  Tim Menzies,et al.  Tuning for Software Analytics: is it Really Necessary? , 2016, Inf. Softw. Technol..

[34]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[35]  Tim Menzies,et al.  Transfer learning in effort estimation , 2015, Empirical Software Engineering.

[36]  D. Ross Jeffery,et al.  Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation , 2008, IEEE Transactions on Software Engineering.

[37]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[38]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[39]  Barry W. Boehm,et al.  Negative results for software effort estimation , 2016, Empirical Software Engineering.

[40]  Thong Ngee Goh,et al.  A study of project selection and feature weighting for analogy based software cost estimation , 2009, J. Syst. Softw..

[41]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[42]  Karen T. Lum,et al.  Selecting Best Practices for Effort Estimation , 2006, IEEE Transactions on Software Engineering.

[43]  Michelle Cartwright,et al.  On Building Prediction Systems for Software Engineers , 2000, Empirical Software Engineering.

[44]  D. Ross Jeffery,et al.  An Empirical Study of Analogy-based Software Effort Estimation , 1999, Empirical Software Engineering.

[45]  Peter A. Whigham,et al.  A Baseline Model for Software Effort Estimation , 2015, TSEM.

[46]  Barbara A. Kitchenham,et al.  A Further Empirical Investigation of the Relationship Between MRE and Project Size , 2003, Empirical Software Engineering.

[47]  Colin J Burgess,et al.  Can genetic programming improve software effort estimation? A comparative evaluation , 2001, Inf. Softw. Technol..

[48]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[49]  Tim Menzies,et al.  Data Mining Methods and Cost Estimation Models: Why is it So Hard to Infuse New Ideas? , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW).

[50]  Tim Menzies,et al.  What is wrong with topic modeling? And how to fix it using search-based software engineering , 2016, Inf. Softw. Technol..

[51]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[52]  Martin J. Shepperd,et al.  Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets , 2003, GECCO.

[53]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[54]  Daniel Port,et al.  Comparative studies of the model evaluation criterions mmre and pred in software cost estimation research , 2008, ESEM '08.