Investigating the use of moving windows to improve software effort prediction: a replicated study

To date most research in software effort estimation has not taken chronology into account when selecting projects for training and validation sets. A chronological split represents the use of a project’s starting and completion dates, such that any model that estimates effort for a new project p only uses as its training set projects that have been completed prior to p’s starting date. A study in 2009 (“S3”) investigated the use of chronological split taking into account a project’s age. The research question investigated was whether the use of a training set containing only the most recent past projects (a “moving window” of recent projects) would lead to more accurate estimates when compared to using the entire history of past projects completed prior to the starting date of a new project. S3 found that moving windows could improve the accuracy of estimates. The study described herein replicates S3 using three different and independent data sets. Estimation models were built using regression, and accuracy was measured using absolute residuals. The results contradict S3, as they do not show any gain in estimation accuracy when using windows for effort estimation. This is a surprising result: the intuition that recent data should be more helpful than old data for effort estimation is not supported. Several factors, which are discussed in this paper, might have contributed to such contradicting results. Some of our future work entails replicating this work using other datasets, to understand better when using windows is a suitable choice for software companies.

[1]  Barbara A. Kitchenham,et al.  The role of replications in empirical software engineering—a word of warning , 2008, Empirical Software Engineering.

[2]  Sousuke Amasaki,et al.  How to treat timing information for software effort estimation? , 2013, ICSSP 2013.

[3]  Sousuke Amasaki,et al.  Evaluation of Moving Window Policies with CART , 2016, 2016 7th International Workshop on Empirical Software Engineering in Practice (IWESEP).

[4]  Katrina D. Maxwell,et al.  Applied Statistics for Software Managers , 2002 .

[5]  Emilia Mendes,et al.  An Empirical Investigation on Effort Estimation in Agile Global Software Development , 2015, 2015 IEEE 10th International Conference on Global Software Engineering.

[6]  Emilia Mendes Practitioner's Knowledge Representation: A Pathway to Improve Software Effort Estimation , 2014 .

[7]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[8]  Thong Ngee Goh,et al.  A study of the non-linear adjustment for analogy based software cost estimation , 2009, Empirical Software Engineering.

[9]  Marta Fernández-Diego,et al.  Sensitivity of results to different data quality meta-data criteria in the sample selection of projects from the ISBSG dataset , 2010, PROMISE '10.

[10]  KitchenhamBarbara,et al.  An empirical study of maintenance and development estimation accuracy , 2002 .

[11]  Xin Yao,et al.  Using unreliable data for creating more reliable online learners , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[12]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[13]  Shari Lawrence Pfleeger,et al.  An empirical study of maintenance and development estimation accuracy , 2002, J. Syst. Softw..

[14]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[15]  Ioannis Stamelos,et al.  BBN based approach for improving the software development process of an SME - a case study , 2010, J. Softw. Maintenance Res. Pract..

[16]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[17]  Magne Jørgensen,et al.  A review of studies on expert estimation of software development effort , 2004, J. Syst. Softw..

[18]  Jeffrey C. Carver Towards Reporting Guidelines for Experimental Replications: A Proposal , 2010 .

[19]  Barbara A. Kitchenham,et al.  An empirical analysis of software productivity over time , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[20]  Stephen G. MacDonell,et al.  Using prior-phase effort records for re-estimation during software projects , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[21]  B. Tabachnick,et al.  Using Multivariate Statistics , 1983 .

[22]  Emilia Mendes,et al.  Why comparative effort prediction studies may be invalid , 2009, PROMISE '09.

[23]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[24]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[25]  Emilia Mendes,et al.  A systematic review of web resource estimation , 2012, PROMISE '12.

[26]  Emilia Mendes,et al.  Investigating the Use of Chronological Splitting to Compare Software Cross-company and Single-company Effort Predictions: A Replicated Study , 2009, EASE.

[27]  Magne Jørgensen,et al.  Practical Guidelines for Expert-Judgment-Based Software Effort Estimation , 2005, IEEE Softw..

[28]  Ioannis Stamelos,et al.  Combining probabilistic models for explanatory productivity estimation , 2008, Inf. Softw. Technol..

[29]  Emilia Mendes,et al.  Investigating the Use of Duration-Based Moving Windows to Improve Software Effort Prediction , 2012, 2012 19th Asia-Pacific Software Engineering Conference.

[30]  Xin Yao,et al.  Can cross-company data improve performance in software effort estimation? , 2012, PROMISE '12.

[31]  Xin Yao,et al.  How to make best use of cross-company data in software effort estimation? , 2014, ICSE.

[32]  R. Cook Detection of influential observation in linear regression , 2000 .

[33]  Tim Menzies,et al.  Transfer learning in effort estimation , 2015, Empirical Software Engineering.

[34]  Sousuke Amasaki,et al.  The Effect of Moving Windows on Software Effort Estimation: Comparative Study with CART , 2014, 2014 6th International Workshop on Empirical Software Engineering in Practice.

[35]  Emilia Mendes,et al.  Effort Estimation in Global Software Development: A Systematic Literature Review , 2014, 2014 IEEE 9th International Conference on Global Software Engineering.

[36]  Emilia Mendes,et al.  How to Make Best Use of Cross-Company Data for Web Effort Estimation? , 2015, 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[37]  Tommaso Turchi,et al.  Realistic assessment of software effort estimation models , 2016, EASE.

[38]  Guilherme Horta Travassos,et al.  Cross versus Within-Company Cost Estimation Studies: A Systematic Review , 2007, IEEE Transactions on Software Engineering.

[39]  Sousuke Amasaki,et al.  The Effects of Gradual Weighting on Duration-Based Moving Windows for Software Effort Estimation , 2014, PROFES.

[40]  Rafael Capilla,et al.  Viability for codifying and documenting architectural design decisions with tool support , 2010 .

[41]  Mika V. Mäntylä,et al.  Rethinking Replication in Software Engineering: Can We See the Forest for the Trees? , 2010 .

[42]  Emilia Mendes,et al.  Applying moving windows to software effort estimation , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[43]  Carolyn Mair,et al.  An Empirical Analysis of Software Productivity , 2006 .

[44]  Sousuke Amasaki,et al.  On the effectiveness of weighted moving windows: Experiment on linear regression based software effort estimation , 2015, J. Softw. Evol. Process..

[45]  Sousuke Amasaki,et al.  A Replication of Comparative Study of Moving Windows on Linear Regression and Estimation by Analogy , 2015, PROMISE.

[46]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..

[47]  Mike Cohn,et al.  Agile Estimating and Planning , 2005 .

[48]  Sousuke Amasaki,et al.  Performance Evaluation of Windowing Approach on Effort Estimation by Analogy , 2011, 2011 Joint Conference of the 21st International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement.

[49]  Sousuke Amasaki,et al.  The Evaluation of Weighted Moving Windows for Software Effort Estimation , 2013, PROFES.

[50]  Sousuke Amasaki,et al.  A replication study on the effects of weighted moving windows for software effort estimation , 2016, EASE.

[51]  Gerardo Canfora,et al.  How changes affect software entropy: an empirical study , 2014, Empirical Software Engineering.

[52]  Emilia Mendes,et al.  Investigating the use of duration-based moving windows to improve software effort prediction: A replicated study , 2014, Inf. Softw. Technol..

[53]  Xin Yao,et al.  The impact of parameter tuning on software effort estimation using learning machines , 2013, PROMISE.

[54]  Peter I. Cowling,et al.  Software Stage-Effort Estimation Based on Association Rule Mining and Fuzzy Set Theory , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[55]  Sousuke Amasaki,et al.  The Effects of Moving Windows to Software Estimation: Comparative Study on Linear Regression and Estimation by Analogy , 2012, 2012 Joint Conference of the 22nd International Workshop on Software Measurement and the 2012 Seventh International Conference on Software Process and Product Measurement.

[56]  Emilia Mendes,et al.  Using Chronological Splitting to Compare Cross- and Single-company Effort Models: Further Investigation , 2009, ACSC.

[57]  Martin J. Shepperd,et al.  Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets , 2003, GECCO.

[58]  Emilia Mendes,et al.  Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions , 2008 .

[59]  Emilia Mendes,et al.  Bayesian Network Models for Web Effort Prediction: A Comparative Study , 2008, IEEE Transactions on Software Engineering.

[60]  Magne Jørgensen,et al.  Avoiding Irrelevant and Misleading Information When Estimating Development Effort , 2008, IEEE Software.

[61]  Magne Jorgensen Relative Estimation of Software Development Effort: It Matters with What and How You Compare , 2013, IEEE Software.

[62]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[63]  A KitchenhamBarbara,et al.  Cross versus Within-Company Cost Estimation Studies , 2007 .

[64]  Emilia Mendes,et al.  Effort estimation in agile software development: a systematic literature review , 2014, PROMISE.

[65]  Burak Turhan,et al.  On the dataset shift problem in software engineering prediction models , 2011, Empirical Software Engineering.

[66]  Cuauhtémoc López Martín,et al.  Software development effort prediction of industrial projects applying a general regression neural network , 2011, Empirical Software Engineering.

[67]  Stephen G. MacDonell,et al.  Data accumulation and software effort prediction , 2010, ESEM '10.