Bayesian statistical models for predicting software development effort

Constructing an accurate eort prediction model is a challenge in Software Engineering. This paper presents new Bayesian statistical models, in order to predict development eort of software systems in the International Software Benchmarking Standards Group (ISBSG) dataset. The first model is a Bayesian linear regression (BR) model and the second model is a Bayesian multivariate normal distribution (BMVN) model. Both models are calibrated using subsets randomly sampled from the dataset. The models’ predictive accuracy is evaluated using other subsets, which consist of only the cases unknown to the models. The predictive accuracy is measured in terms of the absolute residuals and magnitude of relative error. They are compared with the corresponding linear regression models. The results show that the Bayesian models have predictive accuracy equivalent to the linear regression models, in general. However, the advantage of the Bayesian statistical models is that they do not require a calibration subset as large as the regression counterpart. In the case of the ISBSG dataset it is confirmed that the predictive accuracy of the Bayesian statistical models, in particular the BMVN model is significantly better than the linear regression model, when the calibration subset consists of only five or smaller number of software systems. This finding justifies the use of Bayesian statistical models in software eort prediction, in particular, when the system of interest has only a very small amount of historical data.

[1]  Chenggang Bai,et al.  Software failure prediction based on a Markov Bayesian network model , 2005, J. Syst. Softw..

[2]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[3]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[4]  Barry W. Boehm,et al.  Bayesian Analysis of Empirical Software Engineering Cost Models , 1999, IEEE Trans. Software Eng..

[5]  Norman E. Fenton,et al.  Software Metrics: A Rigorous Approach , 1991 .

[6]  Peter Congdon Bayesian statistical modelling , 2002 .

[7]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[8]  B. Stewart Predicting project delivery rates using the Naive-Bayes classifier , 2002, J. Softw. Maintenance Res. Pract..

[9]  Michael Goldstein,et al.  Bayesian Graphical Models for Software Testing , 2002, IEEE Trans. Software Eng..

[10]  J. Moses,et al.  Bayesian probability distributions for assessing measurement of subjective software attributes , 2000, Inf. Softw. Technol..

[11]  Chenggang Bai,et al.  Bayesian network based software reliability prediction with an operational profile , 2005, J. Syst. Softw..

[12]  Parag C. Pendharkar,et al.  A probabilistic model for predicting software development effort , 2003, IEEE Transactions on Software Engineering.

[13]  Peter Green,et al.  A primer in Markov Chain Monte Carlo , 2001 .

[14]  Martin Neil,et al.  Building large-scale Bayesian networks , 2000, The Knowledge Engineering Review.

[15]  Ioannis Stamelos,et al.  On the use of Bayesian belief networks for the prediction of software productivity , 2003, Inf. Softw. Technol..

[16]  Stephen G. MacDonell Establishing relationships between specification size and software process effort in CASE environments , 1997, Inf. Softw. Technol..

[17]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[18]  Barry W. Boehm,et al.  Disaggregating and Calibrating the CASE Tool Variable in COCOMO II , 2002, IEEE Trans. Software Eng..

[19]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[20]  C. van Koten,et al.  An application of Bayesian network for predicting object-oriented software maintainability , 2006, Inf. Softw. Technol..

[21]  Chin-Feng Fan,et al.  BBN-based software project risk management , 2004, J. Syst. Softw..

[22]  Barbara A. Kitchenham,et al.  An empirical validation of the relationship between the magnitude of relative error and project size , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.