A Comparison of Cross-Versus Single-Company Effort Prediction Models for Web Projects

Background: In order to address the challenges in companies having no or limited effort datasets of their own, cross-company models have been a focus of interest for previous studies. Further, a particular domain of investigation has been Web projects. Aim: This study investigates to what extent effort predictions obtained using cross-company (CC) datasets are effective in relation to the predictions obtained using single-company (SC) datasets within the domain of web projects. Method: This study uses the Tukutuku database. We employed data on 125 projects from eight different companies and built cross and single-company models with stepwise linear regression (SWR) with and without relevancy filtering. We also benchmarked these models against mean and median based models. We report a case-by-case analysis per company as well as a meta-analysis of the findings. Results: Results showed that CC models provided poor predictions and performed significantly worse than SC models. However, relevancy filtered CC models yielded comparable results to that of SC models. These results corroborate with previous research. An interesting result was that the median-based models were consistently better than other models. Conclusions: We conclude that companies that carry out Web development may use a median-based CC model for prediction until it is possible for the company to build its own SC model, which can be used by itself or in combination with median-based estimations.

[1]  Emilia Mendes,et al.  Cross-company vs. single-company web effort models using the Tukutuku database: An extended study , 2008, J. Syst. Softw..

[2]  Guilherme Horta Travassos,et al.  Cross versus Within-Company Cost Estimation Studies: A Systematic Review , 2007, IEEE Transactions on Software Engineering.

[3]  Emilia Mendes,et al.  The Need for Web Engineering: An Introduction , 2006, Web Engineering.

[4]  Ayse Basar Bener,et al.  Exploiting the Essential Assumptions of Analogy-Based Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[5]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[6]  Emilia Mendes,et al.  Web effort estimation: the value of cross-company data set compared to single-company data set , 2012, PROMISE '12.

[7]  Emilia Mendes,et al.  A systematic review of web resource estimation , 2012, PROMISE '12.

[8]  Barbara Kitchenham,et al.  A comparison of cross-company and within-company effort estimation models for Web applications , 2004, ICSE 2004.

[9]  Ayse Basar Bener,et al.  A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain , 2010, Software Quality Journal.

[10]  Emilia Mendes,et al.  Effort estimation: how valuable is it for a web company to use a cross-company data set, compared to using its own single-company data set? , 2007, WWW '07.

[11]  Tim Menzies,et al.  How to Find Relevant Data for Effort Estimation? , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[12]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[13]  Yeong-Seok Seo,et al.  Filtering of Inconsistent Software Project Data for Analogy-Based Effort Estimation , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference.

[14]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[15]  Burak Turhan,et al.  On the dataset shift problem in software engineering prediction models , 2011, Empirical Software Engineering.

[16]  Thong Ngee Goh,et al.  A study of project selection and feature weighting for analogy based software cost estimation , 2009, J. Syst. Softw..

[17]  Tore Dybå,et al.  A systematic review of effect size in software engineering experiments , 2007, Inf. Softw. Technol..

[18]  Emilia Mendes,et al.  Further comparison of cross-company and within-company effort estimation models for Web applications , 2004, 10th International Symposium on Software Metrics, 2004. Proceedings..