Investigating the Use of Duration-Based Moving Windows to Improve Software Effort Prediction

To date most research in software effort estimation has not taken into account any form of chronological split when selecting projects for training and testing sets. A chronological split represents the use of a project's starting and completion dates, such that any model that estimates effort for a new project p only uses as its training set projects that were completed prior to p's starting date. Three recent studies investigated the use of chronological splits, using a type of chronological split called a moving window, which represented a subset of the most recent projects completed prior to a project p's starting date. They found some evidence in favour of using windows whenever projects were recent. These studies all defined window sizes as being fixed numbers of recent projects. In practice, we suggest that estimators are more likely to think in terms of elapsed time than the size of the data set, when deciding which projects to include in a training set. Therefore, this paper investigates the effect on accuracy when using moving windows of various durations to form training sets on which to base effort estimates. Our results show that the use of windows based on duration can affect the accuracy of estimates (in this data set, a window of about three years duration appears best), but to a lesser extent than windows based on a fixed number of projects.

[1]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[2]  Emilia Mendes,et al.  Applying moving windows to software effort estimation , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[3]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[4]  R. Dennis Cook,et al.  Detection of Influential Observation in Linear Regression , 2000, Technometrics.

[5]  D. Ross Jeffery,et al.  Cost estimation for web applications , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[6]  Shari Lawrence Pfleeger,et al.  An empirical study of maintenance and development estimation accuracy , 2002, J. Syst. Softw..

[7]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[8]  Filomena Ferrucci,et al.  A Case Study Using Web Objects and COSMIC for Effort Estimation of Web Applications , 2008, 2008 34th Euromicro Conference Software Engineering and Advanced Applications.

[9]  Stephen G. MacDonell,et al.  Data accumulation and software effort prediction , 2010, ESEM '10.

[10]  Burak Turhan,et al.  On the dataset shift problem in software engineering prediction models , 2011, Empirical Software Engineering.

[11]  Stefan Biffl,et al.  Optimal project feature weights in analogy-based cost estimation: improvement and limitations , 2006 .

[12]  Martin J. Shepperd,et al.  Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets , 2003, GECCO.

[13]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[14]  Emilia Mendes,et al.  Investigating the Use of Chronological Splitting to Compare Software Cross-company and Single-company Effort Predictions: A Replicated Study , 2009, EASE.

[15]  Emilia Mendes,et al.  Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions , 2008 .

[16]  Sousuke Amasaki,et al.  Performance Evaluation of Windowing Approach on Effort Estimation by Analogy , 2011, 2011 Joint Conference of the 21st International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement.

[17]  Ioannis Stamelos,et al.  Software productivity and effort prediction with ordinal regression , 2005, Inf. Softw. Technol..

[18]  Katrina D. Maxwell,et al.  Applied Statistics for Software Managers , 2002 .

[19]  Stefan Biffl,et al.  Optimal project feature weights in analogy-based cost estimation: improvement and limitations , 2006, IEEE Transactions on Software Engineering.

[20]  Stefan Biffl,et al.  Increasing the accuracy and reliability of analogy-based cost estimation with extensive project feature dimension weighting , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[21]  Sousuke Amasaki,et al.  The Effects of Moving Windows to Software Estimation: Comparative Study on Linear Regression and Estimation by Analogy , 2012, 2012 Joint Conference of the 22nd International Workshop on Software Measurement and the 2012 Seventh International Conference on Software Process and Product Measurement.

[22]  Emilia Mendes,et al.  Using Chronological Splitting to Compare Cross- and Single-company Effort Models: Further Investigation , 2009, ACSC.