SPACE Software Productivity Analysis and Cost Estimation

Multivariate regression models have been commonly used to estimate the software development effort to assist project planning and/or management. These models require a complete data set that has no missing values for model construction. The complete data set is usually built either by using imputation methods or by deleting projects and/or metrics that have missing values (we call this RC deletion). However, it is unclear which method is the most suitable for the effort estimation. In this paper, using the ISBSG data set of 706 projects (containing 47% missing values) collected from several companies, we applied four imputation methods (mean imputation, pairwise deletion, k-NN method and CF method) and RC deletion to build regression models. Then, using a data set of 143 projects (with no missing values), we evaluated the estimation performance of models after applying each imputation or the RC deletion. The result showed that the similarity-based imputation method (k-NN method and CF method) showed better performance than other methods in terms of MdMAE, MdMRE, MdMER and Pred(25).

[1]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[2]  Taghi M. Khoshgoftaar,et al.  A Comprehensive Empirical Study of Count Models for Software Fault Prediction , 2007, IEEE Transactions on Reliability.

[3]  Dino Mandrioli,et al.  Modeling the Environment in Software-Intensive Systems , 2007, International Workshop on Modeling in Software Engineering (MISE'07: ICSE Workshop 2007).

[4]  Liming Zhu,et al.  Evaluating guidelines for empirical software engineering studies , 2006, ISESE '06.

[5]  Taghi M. Khoshgoftaar,et al.  An empirical study of predicting software faults with case-based reasoning , 2006, Software Quality Journal.

[6]  Amela Karahasanovic,et al.  A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.

[7]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[8]  E. Rajabally,et al.  Aids to Bayesian belief network construction , 2004, 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings (IEEE Cat. No.04EX791).

[9]  Todd Little,et al.  Value creation and capture: a model of the software development process , 2004, IEEE Software.

[10]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[11]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[12]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[13]  Michelle Cartwright,et al.  On Building Prediction Systems for Software Engineers , 2000, Empirical Software Engineering.

[14]  Keith Phalp,et al.  An investigation of machine learning based prediction systems , 2000, J. Syst. Softw..

[15]  Barbara A. Kitchenham,et al.  An investigation of analysis techniques for software datasets , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[16]  Silja Renooij,et al.  Talking probabilities: communicating probabilistic information with words and numbers , 1999, Int. J. Approx. Reason..

[17]  Ingunn Myrtveit,et al.  A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models , 1999, IEEE Trans. Software Eng..

[18]  A. Ford Modeling the Environment: An Introduction To System Dynamics Modeling Of Environmental Systems , 1999 .

[19]  Walter F. Tichy,et al.  Should Computer Scientists Experiment More? , 1998, Computer.

[20]  Michael R. Lyu,et al.  An experiment in determining software reliability model applicability , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[21]  R. Cooke Experts in Uncertainty: Opinion and Subjective Probability in Science , 1991 .

[22]  R. MICHAEL,et al.  Applying Reliability Models More Effectively , 2009 .

[23]  Liang Tian,et al.  Computational Intelligence Methods in Software Reliability Prediction , 2007, Computational Intelligence in Reliability Engineering.

[24]  N. Fenton,et al.  Managing Risk in the Modern World : Applications of Bayesian Networks , 2007 .

[25]  Ioannis Stamelos,et al.  On the use of Bayesian belief networks for the prediction of software productivity , 2003, Inf. Softw. Technol..

[26]  Mary Shaw,et al.  Selecting a Defect Prediction Model for Maintenance Resource Planning and Software Insurance , 2003 .

[27]  Stephen G. MacDonell,et al.  A comparison of techniques for developing predictive models of software metrics , 1997, Inf. Softw. Technol..

[28]  Tom DeMarco,et al.  Peopleware: Productive Projects and Teams , 1987 .

[29]  Martin Neil,et al.  Using Ranked Nodes to Model Qualitative Judgments in Bayesian Networks , 2007, IEEE Transactions on Knowledge and Data Engineering.