Using Chronological Splitting to Compare Cross- and Single-company Effort Models: Further Investigation

Numerous studies have used historical datasets to build and validate models for estimating software development effort. Very few used a chronological split (where projects' end dates are used so that training sets only contain projects that were completed before the start date of each project in the validation set), and only one compared chronological split to random split. Therefore the aim of this study is to investigate further and compare the use of chronological and random splitting. We do so in the context of comparing cross-company and singlecompany models for effort estimation. We used 450 single-company projects and 741 cross-company projects from the ISBSG Release 10 repository, and estimates were obtained using manual stepwise regression. We found that with these data the use of chronological splitting, and different splitting dates, did not affect prediction accuracy. We were not able to obtain a converging set of findings when comparing cross- to single-company predictions given that different accuracy measures presented contradictory results.

[1]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[2]  Ioannis Stamelos,et al.  Software productivity and effort prediction with ordinal regression , 2005, Inf. Softw. Technol..

[3]  Katrina D. Maxwell,et al.  Applied Statistics for Software Managers , 2002 .

[4]  Guilherme Horta Travassos,et al.  Cross versus Within-Company Cost Estimation Studies: A Systematic Review , 2007, IEEE Transactions on Software Engineering.

[5]  Emilia Mendes,et al.  Replicating studies on cross- vs single-company effort models using the ISBSG Database , 2008, Empirical Software Engineering.

[6]  Emilia Mendes,et al.  Cross-company and single-company effort models using the ISBSG database: a further replicated study , 2006, ISESE '06.

[7]  Barbara A. Kitchenham,et al.  An empirical analysis of software productivity over time , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[8]  Emilia Mendes,et al.  Cross-company vs. single-company web effort models using the Tukutuku database: An extended study , 2008, J. Syst. Softw..

[9]  Stefan Biffl,et al.  Increasing the accuracy and reliability of analogy-based cost estimation with extensive project feature dimension weighting , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[10]  Emilia Mendes,et al.  Further comparison of cross-company and within-company effort estimation models for Web applications , 2004 .

[11]  Stephen G. MacDonell,et al.  Using prior-phase effort records for re-estimation during software projects , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[12]  Emilia Mendes,et al.  Investigating the Use of Chronological Splitting to Compare Software Cross-company and Single-company Effort Predictions: A Replicated Study , 2009, EASE.

[13]  R. Cook Detection of influential observation in linear regression , 2000 .

[14]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[15]  Thomas Zimmermann,et al.  Building Software Cost Estimation Models using Homogenous Data , 2007, ESEM 2007.

[16]  Barbara Kitchenham,et al.  A comparison of cross-company and within-company effort estimation models for Web applications , 2004, ICSE 2004.

[17]  Giancarlo Succi,et al.  Effort Prediction in Iterative Software Development Processes -- Incremental Versus Global Prediction Models , 2007, ESEM 2007.

[18]  Stefan Biffl,et al.  Optimal project feature weights in analogy-based cost estimation: improvement and limitations , 2006, IEEE Transactions on Software Engineering.

[19]  Barbara A. Kitchenham,et al.  A Procedure for Analyzing Unbalanced Datasets , 1998, IEEE Trans. Software Eng..

[20]  Martin J. Shepperd,et al.  Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets , 2003, GECCO.

[21]  A. Hossain,et al.  A comparative study on detection of influential observations in linear regression , 1991 .