Local bias and its impacts on the performance of parametric estimation models

Background: Continuously calibrated and validated parametric models are necessary for realistic software estimates. However, in practice, variations in model adoption and usage patterns introduce a great deal of local bias in the resultant historical data. Such local bias should be carefully examined and addressed before the historical data can be used for calibrating new versions of parametric models. Aims: In this study, we aim at investigating the degree of such local bias in a cross-company historical dataset, and assessing its impacts on parametric estimation model's performance. Method: Our study consists of three parts: 1) defining a method for measuring and analyzing the local bias associated with individual organization data subset in the overall dataset; 2) assessing the impacts of local bias on the estimation performance of COCOMO II 2000 model; 3) performing a correlation analysis to verify that local bias can be harmful to the performance of a parametric estimation model. Results: Our results show that the local bias negatively impacts the performance of parametric model. Our measure of local bias has a positive correlation with the performance by statistical importance. Conclusion: Local calibration by using the whole multi-company data would get worse performance. The influence of multi-company data could be defined by local bias and be measured by our method.

[1]  Robert C. Mintram,et al.  Preliminary Data Analysis Methods in Software Estimation , 2005, Software Quality Journal.

[2]  Daniel Port,et al.  Comparative studies of the model evaluation criterions mmre and pred in software cost estimation research , 2008, ESEM '08.

[3]  D. Ross Jeffery,et al.  A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data , 2000, Inf. Softw. Technol..

[4]  Barry W. Boehm,et al.  Bayesian Analysis of Empirical Software Engineering Cost Models , 1999, IEEE Trans. Software Eng..

[5]  Guilherme Horta Travassos,et al.  Cross versus Within-Company Cost Estimation Studies: A Systematic Review , 2007, IEEE Transactions on Software Engineering.

[6]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[7]  Richard Stutzke Estimating Software-Intensive Systems: Projects, Products, and Processes (Sei Series in Software Engineering) , 2005 .

[8]  Tim Menzies,et al.  Evidence-based cost estimation better-quality for software , 2006, IEEE Software.

[9]  Magne Jørgensen,et al.  Evidence-based guidelines for assessment of software development cost uncertainty , 2005, IEEE Transactions on Software Engineering.

[10]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[11]  Barry Boehm,et al.  Software Cost Estimation with Cocomo II with Cdrom , 2000 .

[12]  Magne Jørgensen,et al.  Better sure than safe? Over-confidence in judgement based software development effort prediction intervals , 2004, J. Syst. Softw..

[13]  Miguel-Ángel Sicilia,et al.  An Algorithm for the Generation of Segmented Parametric Software Estimation Models and Its Empirical Evaluation , 2007, Comput. Artif. Intell..

[14]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[15]  Barry W. Boehm,et al.  A constrained regression technique for cocomo calibration , 2008, ESEM '08.

[16]  M. Jorgensen,et al.  Uncertainty Intervals versus Interval Uncertainty: An Alternative Method for Eliciting Effort Prediction Intervals in Software Development Projects (ProMAC2002予稿集) -- (Risk Management(1)) , 2002 .

[17]  Barry W. Boehm,et al.  Calibrating the COCOMO II Post-Architecture model , 1998, Proceedings of the 20th International Conference on Software Engineering.

[18]  Ayse Basar Bener,et al.  Exploiting the Essential Assumptions of Analogy-Based Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[19]  Barbara A. Kitchenham,et al.  A Procedure for Analyzing Unbalanced Datasets , 1998, IEEE Trans. Software Eng..

[20]  Karen T. Lum,et al.  Selecting Best Practices for Effort Estimation , 2006, IEEE Transactions on Software Engineering.