Assessing Variation in Development Effort Consistency Using a Data Source with Missing Data

Abstract In this study the authors analyse the International Software Benchmarking Standards Group data repository, Release 8.0. The data repository comprises project data from several different companies. However, the repository exhibits missing data, which must be handled in an appropriate manner, otherwise inferences may be made that are biased and misleading. The authors re-examine a statistical model that explained about 62% of the variability in actual software development effort (Summary Work Effort) which was conditioned on a sample from the repository of 339 observations. This model exhibited covariates Adjusted Function Points and Maximum Team Size and dependence on Language Type (which includes categories 2nd, 3rd, 4th Generation Languages and Application Program Generators) and Development Type (enhancement, new development and re-development). The authors now use Bayesian inference and the Bayesian statistical simulation program, BUGS, to impute missing data avoiding deletion of observations with missing Maximum Team size and increasing sample size to 616. Providing that by imputing data distributional biases are not introduced, the accuracy of inferences made from models that fit the data will increase. As a consequence of imputation, models that fit the data and explain about 59% of the variability in actual effort are identified. These models enable new inferences to be made about Language Type and Development Type. The sensitivity of the inferences to alternative distributions for imputing missing data is also considered. Furthermore, the authors contemplate the impact of these distributions on the explained variability of actual effort and show how valid effort estimates can be derived to improve estimate consistency.

[1]  Qinbao Song,et al.  Dealing with missing software project data , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[2]  R. H. Myers,et al.  Probability and Statistics for Engineers and Scientists , 1978 .

[3]  Robert T. Hughes,et al.  Expert judgement as an estimating method , 1996, Inf. Softw. Technol..

[4]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[5]  Ingunn Myrtveit,et al.  Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods , 2001, IEEE Trans. Software Eng..

[6]  Khaled El Emam,et al.  Software Cost Estimation with Incomplete Data , 2001, IEEE Trans. Software Eng..

[7]  Charles R. Symons,et al.  Software sizing and estimating - Mk II FPA, function point analysis , 1991, Wiley series in software engineering practice.

[8]  Joseph M. Mellichamp,et al.  Software Development Cost Estimation Using Function Points , 1994, IEEE Trans. Software Eng..

[9]  Barbara A. Kitchenham,et al.  Empirical studies of assumptions that underlie software cost-estimation models , 1992, Inf. Softw. Technol..

[10]  J. Moses A consideration of the impact of interactions with module effects on the direct measurement of subjective software attributes , 2001, Proceedings Seventh International Software Metrics Symposium.

[11]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[12]  D. Lindley The Philosophy of Statistics , 2000 .

[13]  E. GaffneyJ.,et al.  Software Function, Source Lines of Code, and Development Effort Prediction , 1983 .

[14]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[15]  Roderick J. A. Little,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models: Comment , 1999 .

[16]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[17]  Malcolm Farrow,et al.  A consideration of the variation in development effort consistency due to function points , 2004 .

[18]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[19]  Barbara A. Kitchenham,et al.  A Further Empirical Investigation of the Relationship Between MRE and Project Size , 2003, Empirical Software Engineering.

[20]  Building a software cost estimation model based on categorical data , 2001, Proceedings Seventh International Software Metrics Symposium.

[21]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[22]  John Moses,et al.  A Procedure for Assessing the Influence of Problem Domain on Effort Estimation Consistency , 2003, Software Quality Journal.

[23]  Tomás Aluja,et al.  Book review: Multiple correspondence analysis and related methods. Greenacre, M. and Blasius, J. Chapman & Hall/CRC, 2006. , 2006 .

[24]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[25]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.