Bayesian statistical effort prediction models for data-centred 4GL software development

Abstract Constructing an accurate effort prediction model is a challenge in Software Engineering. This paper presents three Bayesian statistical software effort prediction models for database-oriented software systems, which are developed using a specific 4GL toolsuite. The models consist of specification-based software size metrics and development team's productivity metric. The models are constructed based on the subjective knowledge of human expert and calibrated using empirical data collected from 17 software systems developed in the target environment. The models' predictive accuracy is evaluated using subsets of the same data, which were not used for the models' calibration. The results show that the models have achieved very good predictive accuracy in terms of MMRE and pred measures. Hence, it is confirmed that the Bayesian statistical models can predict effort successfully in the target environment. In comparison with commonly used multiple linear regression models, the Bayesian statistical models'predictive accuracy is equivalent in general. However, when the number of software systems used for the models' calibration becomes smaller than five, the predictive accuracy of the best Bayesian statistical models are significantly better than the multiple linear regression model. This result suggests that the Bayesian statistical models would be a better choice when software organizations/practitioners do not posses sufficient empirical data for the models' calibration. The authors expect these findings to encourage more researchers to investigate the use of Bayesian statistical models for predicting software effort.

[1]  G. Tate,et al.  Approaches to measuring size of application products with CASE tools , 1991 .

[2]  Norman E. Fenton,et al.  Software Metrics: A Rigorous Approach , 1991 .

[3]  Michelle Cartwright,et al.  On Building Prediction Systems for Software Engineers , 2000, Empirical Software Engineering.

[4]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[5]  C. van Koten,et al.  An application of Bayesian network for predicting object-oriented software maintainability , 2006, Inf. Softw. Technol..

[6]  D. Cox,et al.  Complex stochastic systems , 2000 .

[7]  Barbara A. Kitchenham,et al.  An empirical validation of the relationship between the magnitude of relative error and project size , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[8]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[9]  Stephen G. MacDonell Establishing relationships between specification size and software process effort in CASE environments , 1997, Inf. Softw. Technol..

[10]  Peter Congdon Bayesian statistical modelling , 2002 .

[11]  José Javier Dolado,et al.  A Validation of the Component-Based Method for Software Size Estimation , 2000, IEEE Trans. Software Eng..

[12]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[13]  June M. Verner,et al.  A Software Size Model , 1992, IEEE Trans. Software Eng..

[14]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[15]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[16]  B. Stewart Predicting project delivery rates using the Naive-Bayes classifier , 2002, J. Softw. Maintenance Res. Pract..

[17]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[18]  José Javier Dolado,et al.  A Study of the Relationships among Albrecht and Mark II Function Points, Lines of Code 4GL and Effort , 1997, J. Syst. Softw..

[19]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[20]  Barry W. Boehm,et al.  Bayesian Analysis of Empirical Software Engineering Cost Models , 1999, IEEE Trans. Software Eng..

[21]  Robert L. Glass,et al.  Loyal Opposition - Frequently Forgotten Fundamental Facts about Software Engineering , 2001, IEEE Softw..

[22]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[23]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[24]  Barry W. Boehm,et al.  Disaggregating and Calibrating the CASE Tool Variable in COCOMO II , 2002, IEEE Trans. Software Eng..

[25]  Peter Green,et al.  A primer in Markov Chain Monte Carlo , 2001 .

[26]  Martin Neil,et al.  Building large-scale Bayesian networks , 2000, The Knowledge Engineering Review.

[27]  Ioannis Stamelos,et al.  On the use of Bayesian belief networks for the prediction of software productivity , 2003, Inf. Softw. Technol..