A comparative study for estimating software development effort intervals

Software cost/effort estimation is still an open challenge. Many researchers have proposed various methods that usually focus on point estimates. Until today, software cost estimation has been treated as a regression problem. However, in order to prevent overestimates and underestimates, it is more practical to predict the interval of estimations instead of the exact values. In this paper, we propose an approach that converts cost estimation into a classification problem and that classifies new software projects in one of the effort classes, each of which corresponds to an effort interval. Our approach integrates cluster analysis with classification methods. Cluster analysis is used to determine effort intervals while different classification algorithms are used to find corresponding effort classes. The proposed approach is applied to seven public datasets. Our experimental results show that the hit rate obtained for effort estimation are around 90–100%, which is much higher than that obtained by related studies. Furthermore, in terms of point estimation, our results are comparable to those in the literature although a simple mean/median is used for estimation. Finally, the dynamic generation of effort intervals is the most distinctive part of our study, and it results in time and effort gain for project managers through the removal of human intervention.

[1]  Ioannis Stamelos,et al.  Multinomial Logistic Regression Applied on Software Productivity Prediction , 2003 .

[2]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[3]  Miguel-Ángel Sicilia,et al.  Software Project Effort Estimation Based on Multiple Parametric Models Generated Through Data Clustering , 2007, Journal of Computer Science and Technology.

[4]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[5]  N. Draper,et al.  Applied Regression Analysis. , 1967 .

[6]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  B. Baskeles,et al.  Software effort estimation using machine learning methods , 2007, 2007 22nd international symposium on computer and information sciences.

[9]  Barry W. Boehm,et al.  Software development cost estimation approaches — A survey , 2000, Ann. Softw. Eng..

[10]  Tim Menzies,et al.  Evidence-based cost estimation better-quality for software , 2006, IEEE Software.

[11]  Michael Rovatsos,et al.  Handbook of Software Engineering and Knowledge Engineering , 2005 .

[12]  Chun Hung Cheng,et al.  Software development cost estimation: Integrating neural network with cluster analysis , 1998, Inf. Manag..

[13]  F. J. Heemstra,et al.  Software cost estimation , 1992, Inf. Softw. Technol..

[14]  Mustafa Mat Deris,et al.  Performance Analysis of Partitional and Incremental Clustering , 2005 .

[15]  Ioannis Stamelos,et al.  On the use of Bayesian belief networks for the prediction of software productivity , 2003, Inf. Softw. Technol..

[16]  Zyad Shaaban,et al.  Normalization as a Preprocessing Engine for Data Mining and the Approach of Preference Matrix , 2006, 2006 International Conference on Dependability of Computer Systems.

[17]  Ioannis Stamelos,et al.  Managing uncertainty in project portfolio cost estimation , 2001, Inf. Softw. Technol..

[18]  Barbara A. Kitchenham,et al.  A Further Empirical Investigation of the Relationship Between MRE and Project Size , 2003, Empirical Software Engineering.

[19]  Magne Jørgensen,et al.  Comments on ‘A Simulation Tool for Efficient Analogy Based Cost Estimation’, by L. Angelis and I. Stamelos, published in Empirical Software Engineering, 5, 35–68 (2000) , 2002, Empirical Software Engineering.

[20]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[21]  M. Jorgensen,et al.  Uncertainty Intervals versus Interval Uncertainty: An Alternative Method for Eliciting Effort Prediction Intervals in Software Development Projects (ProMAC2002予稿集) -- (Risk Management(1)) , 2002 .

[22]  Gary D. Boetticher,et al.  Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-Starved Domains , 2001 .

[23]  Ioannis Stamelos,et al.  Software Cost Prediction with Predefined Interval Estimates , 2004 .

[24]  Ioannis Stamelos,et al.  Software productivity and effort prediction with ordinal regression , 2005, Inf. Softw. Technol..

[25]  David N. Card,et al.  Managers Handbook for Software Development , 1984 .

[26]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[27]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[28]  Nasser Tadayon Neural network approach for software cost estimation , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[29]  Magne Jørgensen,et al.  An effort prediction interval approach based on the empirical distribution of previous estimation accuracy , 2003, Inf. Softw. Technol..

[30]  Y. Miyazaki,et al.  Robust regression for developing software estimation models , 1994, J. Syst. Softw..

[31]  Frank Tsui,et al.  Essentials of software engineering , 2006 .

[32]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[33]  Clarence Giese,et al.  Partitioning considerations for complex computer based weapon systems , 1984, J. Syst. Softw..

[34]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[35]  Ioannis Stamelos,et al.  A Simulation Tool for Efficient Analogy Based Cost Estimation , 2000, Empirical Software Engineering.