A comparative evaluation on the accuracies of software effort estimates from clustered data

Precision in estimating the required software development effort plays a critical factor in the success of software project management. Most existing software effort estimation models only compare the accuracies of software effort estimates from the historical data without clustering. A potential factor that can affect the accuracies of the established effort estimation models is the homogeneity of the data. However, such investigation on the effects of the accuracies of the derived effort estimates is seldom explored in software effort estimation literature. Therefore, this paper aims to explore the effects of accuracies of the software effort estimation models established from the clustered data by using the International Software Benchmarking Standards Group (ISBSG) repository. The ordinary least square (OLS) regression method is adopted to establish a respective effort estimation model in each cluster of datasets. The empirical experiment results show that the estimation accuracies do not reveal significant differences within the respective dataset clustered by each software effort driver. It also demonstrates that software effort estimation models from the clustered data present almost similar accuracy results compared to models from the entire data without clustering.

[1]  Sun-Jen Huang,et al.  Optimization of analogy weights by genetic algorithm for software effort estimation , 2006, Inf. Softw. Technol..

[2]  D. Ross Jeffery,et al.  A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data , 2000, Inf. Softw. Technol..

[3]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[4]  Isabella Wieczorek,et al.  How valuable is company-specific data compared to multi-company data for software cost estimation? , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[5]  Dag I. K. Sjøberg,et al.  The impact of customer expectation on software development effort estimates , 2004 .

[6]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[7]  Soumitra Dutta,et al.  Performance Evaluation of General and Company Specific Models in Software Development Effort Estimation , 1999 .

[8]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[9]  Chaochang Chiu,et al.  An adapted covering algorithm approach for modeling airplanes landing gravities , 2004, Expert Syst. Appl..

[10]  Barbara A. Kitchenham,et al.  An investigation of analysis techniques for software datasets , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[11]  Angus M. Brown A new software for carrying out one-way ANOVA post hoc tests , 2005, Comput. Methods Programs Biomed..

[12]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[13]  Xiaodong Wang,et al.  Using k-means clustering to identify time-of-day break points for traffic signal timing plans , 2005, Proceedings. 2005 IEEE Intelligent Transportation Systems, 2005..

[14]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[15]  Magne Jørgensen,et al.  Software effort estimation by analogy and "regression toward the mean" , 2003, J. Syst. Softw..

[16]  Roberto de Alencar Lotufo,et al.  Pearson's Correlation Coefficient for Discarding Redundant Information in Real Time Autonomous Navigation System , 2007, 2007 IEEE International Conference on Control Applications.

[17]  D. Ross Jeffery,et al.  Using public domain metrics to estimate software development effort , 2001, Proceedings Seventh International Software Metrics Symposium.

[18]  Emilia Mendes,et al.  A comparison of development effort estimation techniques for Web hypermedia applications , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[19]  Emilia Mendes,et al.  Replicating studies on cross- vs single-company effort models using the ISBSG Database , 2008, Empirical Software Engineering.

[20]  Sun-Jen Huang,et al.  The adjusted analogy-based software effort estimation based on similarity distances , 2007, J. Syst. Softw..

[21]  Emilia Mendes,et al.  A replicated comparison of cross-company and within-company effort estimation models using the ISBSG database , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[22]  Tibor Cserháti,et al.  Use of stepwise regression analysis and cluster analysis for the study of the interaction between nonionic surfactants , 1999 .

[23]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[24]  Abbas Heiat,et al.  Comparison of artificial neural network and regression models for estimating software development effort , 2002, Inf. Softw. Technol..

[25]  Barbara A. Kitchenham,et al.  Software project development cost estimation , 1985, J. Syst. Softw..

[26]  Guilherme Horta Travassos,et al.  Cross versus Within-Company Cost Estimation Studies: A Systematic Review , 2007, IEEE Transactions on Software Engineering.

[27]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[28]  Ellis Horowitz,et al.  Software Cost Estimation with COCOMO II , 2000 .

[29]  Emilia Mendes,et al.  Further comparison of cross-company and within-company effort estimation models for Web applications , 2004 .

[30]  Li-Wei Chen,et al.  Integration of the grey relational analysis with genetic algorithm for software effort estimation , 2008, Eur. J. Oper. Res..

[31]  Maciej Kucharski,et al.  Size and effort estimation for applications written in Java , 2004, Inf. Softw. Technol..

[32]  Tim Menzies,et al.  Feature subset selection can improve software cost estimation accuracy , 2005, ACM SIGSOFT Softw. Eng. Notes.

[33]  Martin J. Shepperd,et al.  Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets , 2003, GECCO.

[34]  Lionel C. Briand,et al.  An assessment and comparison of common software cost estimation modeling techniques , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[35]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[36]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[37]  Ingunn Myrtveit,et al.  Human performance estimating with analogy and regression models: an empirical validation , 1998, Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262).

[38]  Barbara Kitchenham,et al.  Software cost models , 1984 .