Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models

Software Effort Estimation (SEE) models can be used for decision-support by software managers to determine the effort required to develop a software project. They are created based on data describing projects completed in the past. Such data could include past projects from within the company that we are interested in (WC projects) and/or from other companies (cross-company, i.e., CC projects). In particular, the use of CC data has been investigated in an attempt to overcome limitations caused by the typically small size of WC datasets. However, software companies operate in non-stationary environments, where changes may affect the typical effort required to develop software projects. Our previous work showed that both WC and CC models of the past can become more or less useful over time, i.e., they can sometimes be helpful and sometimes misleading. So, how can we know if and when a model created based on past data represents well the current projects being estimated? We propose an approach called Dynamic Cross-company Learning (DCL) to dynamically identify which WC or CC past models are most useful for making predictions to a given company at the present. DCL automatically emphasizes the predictions given by these models in order to improve predictive performance. Our experiments comparing DCL against existing WC and CC approaches show that DCL is successful in improving SEE by emphasizing the most useful past models. A thorough analysis of DCL’s behaviour is provided, strengthening its external validity.

[1]  Stefan Biffl,et al.  Increasing the accuracy and reliability of analogy-based cost estimation with extensive project feature dimension weighting , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[2]  Sousuke Amasaki,et al.  The Effects of Moving Windows to Software Estimation: Comparative Study on Linear Regression and Estimation by Analogy , 2012, 2012 Joint Conference of the 22nd International Workshop on Software Measurement and the 2012 Seventh International Conference on Software Process and Product Measurement.

[3]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  L MinkuLeandro,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010 .

[6]  Ahmed E. Hassan,et al.  Think locally, act globally: Improving defect and effort prediction models , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[7]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[8]  Emilia Mendes,et al.  Using Chronological Splitting to Compare Cross- and Single-company Effort Models: Further Investigation , 2009, ACSC.

[9]  Tim Menzies,et al.  Transfer learning in effort estimation , 2015, Empirical Software Engineering.

[10]  Lionel C. Briand,et al.  A replicated Assessment of Common Software Cost Estimation Techniques , 2000, ICSE 2000.

[11]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[12]  Magne Jørgensen,et al.  The Impact of Irrelevant and Misleading Information on Software Development Effort Estimates: A Randomized Controlled Field Experiment , 2011, IEEE Transactions on Software Engineering.

[13]  Emilia Mendes,et al.  Investigating the use of chronological split for software effort estimation , 2009, IET Softw..

[14]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[15]  Stefan Biffl,et al.  Optimal project feature weights in analogy-based cost estimation: improvement and limitations , 2006 .

[16]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[17]  Martin J. Shepperd,et al.  Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets , 2003, GECCO.

[18]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[19]  Guilherme Horta Travassos,et al.  Cross versus Within-Company Cost Estimation Studies: A Systematic Review , 2007, IEEE Transactions on Software Engineering.

[20]  Stefan Biffl,et al.  Optimal project feature weights in analogy-based cost estimation: improvement and limitations , 2006, IEEE Transactions on Software Engineering.

[21]  Emilia Mendes,et al.  Investigating the Use of Duration-Based Moving Windows to Improve Software Effort Prediction , 2012, 2012 19th Asia-Pacific Software Engineering Conference.

[22]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[23]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[24]  Shari Lawrence Pfleeger,et al.  An empirical study of maintenance and development estimation accuracy , 2002, J. Syst. Softw..

[25]  Mark L. Mitchell,et al.  Research Design Explained , 1987 .

[26]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[27]  Isabella Wieczorek,et al.  How valuable is company-specific data compared to multi-company data for software cost estimation? , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[29]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[30]  Emilia Mendes,et al.  Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions , 2008 .

[31]  P.Madhavi Latha Dr R.Satya Prasad Y.Sangeetha,et al.  Comparative Analysis of COCOMO II, SEER-SEM and True-S Software Cost Models , 2008 .

[32]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[33]  Sousuke Amasaki,et al.  A Replication of Comparative Study of Moving Windows on Linear Regression and Estimation by Analogy , 2015, PROMISE.

[34]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..

[35]  KitchenhamBarbara,et al.  An empirical study of maintenance and development estimation accuracy , 2002 .

[36]  Sousuke Amasaki,et al.  Performance Evaluation of Windowing Approach on Effort Estimation by Analogy , 2011, 2011 Joint Conference of the 21st International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement.

[37]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[38]  Forrest Shull,et al.  Local versus Global Lessons for Defect Prediction and Effort Estimation , 2013, IEEE Transactions on Software Engineering.

[39]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[40]  Burak Turhan,et al.  Data mining for software engineering and humans in the loop , 2016, Progress in Artificial Intelligence.

[41]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[42]  Xin Yao,et al.  journal homepage: www.elsevier.com/locate/infsof Ensembles and locality: Insight on improving software effort estimation , 2022 .

[43]  Sousuke Amasaki,et al.  On the effectiveness of weighted moving windows: Experiment on linear regression based software effort estimation , 2015, J. Softw. Evol. Process..

[44]  Xin Yao,et al.  The impact of parameter tuning on software effort estimation using learning machines , 2013, PROMISE.

[45]  Ayse Basar Bener,et al.  Ensemble of neural networks with associative memory (ENNA) for estimating software development costs , 2009, Knowl. Based Syst..

[46]  Emilia Mendes,et al.  Applying moving windows to software effort estimation , 2009, ESEM 2009.

[47]  Stephen G. MacDonell,et al.  Data accumulation and software effort prediction , 2010, ESEM '10.

[48]  Xin Yao,et al.  A principled evaluation of ensembles of learning machines for software effort estimation , 2011, Promise '11.

[49]  Tom DeMarco,et al.  Controlling Software Projects: Management, Measurement, and Estimates , 1986 .

[50]  Burak Turhan,et al.  A Comparison of Cross-Versus Single-Company Effort Prediction Models for Web Projects , 2014, 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications.

[51]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[52]  Xin Yao,et al.  Can cross-company data improve performance in software effort estimation? , 2012, PROMISE '12.

[53]  Xin Yao,et al.  How to make best use of cross-company data in software effort estimation? , 2014, ICSE.

[54]  Tim Menzies,et al.  When to use data from other projects for effort estimation , 2010, ASE.

[55]  Qinbao Song,et al.  Dealing with missing software project data , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[56]  Magne Jørgensen,et al.  The role of outcome feedback in improving the uncertainty assessment of software development effort estimates , 2008, TSEM.

[57]  Emilia Mendes,et al.  Investigating the use of duration-based moving windows to improve software effort prediction: A replicated study , 2014, Inf. Softw. Technol..

[58]  Ioannis Stamelos,et al.  Software productivity and effort prediction with ordinal regression , 2005, Inf. Softw. Technol..

[59]  Xin Yao,et al.  Software effort estimation as a multiobjective learning problem , 2013, TSEM.

[60]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[61]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[62]  Barbara A. Kitchenham,et al.  An empirical analysis of software productivity over time , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[63]  Barbara Kitchenham,et al.  Software cost models , 1984 .

[64]  Barbara Kitchenham,et al.  The MERMAID Approach to software cost estimation , 1990 .

[65]  Sousuke Amasaki,et al.  The Effects of Gradual Weighting on Duration-Based Moving Windows for Software Effort Estimation , 2014, PROFES.

[66]  T. Levine,et al.  Eta Squared, Partial Eta Squared, and Misreporting of Effect Size in Communication Research , 2002 .

[67]  Emilia Mendes,et al.  Investigating the Use of Chronological Splitting to Compare Software Cross-company and Single-company Effort Predictions: A Replicated Study , 2009, EASE.