Software Effort Prediction using Statistical and Machine Learning Methods

Accurate software effort estimation is an important part of software process. Effort is measured in terms of person months and duration. Both overestimation and underestimation of software effort may lead to risky consequences. Also, software project managers have to make estimates of how much a software development is going to cost. The dominant cost for any software is the cost of calculating effort. Thus, effort estimation is very crucial and there is always a need to improve its accuracy as much as possible. There are various effort estimation models, but it is difficult to determine which model gives more accurate estimation on which dataset. This paper empirically evaluates and compares the potential of Linear Regression, Artificial Neural Network, Decision Tree, Support Vector Machine and Bagging on software project dataset. The dataset is obtained from 499 projects. The results show that Mean Magnitude Relative error of decision tree method is only 17.06%. Thus, the performance of decision tree method is better than all the other compared methods.

[1]  Colin J Burgess,et al.  Can genetic programming improve software effort estimation? A comparative evaluation , 2001, Inf. Softw. Technol..

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Mahmoud O. Elish Improved estimation of software project effort using multiple additive regression trees , 2009, Expert Syst. Appl..

[4]  Parag C. Pendharkar,et al.  An empirical study of the effect of complexity, platform, and program type on software development effort of business applications , 2006, Empirical Software Engineering.

[5]  Sun-Jen Huang,et al.  The adjusted analogy-based software effort estimation based on similarity distances , 2007, J. Syst. Softw..

[6]  Miguel-Ángel Sicilia,et al.  Software Project Effort Estimation Based on Multiple Parametric Models Generated Through Data Clustering , 2007, Journal of Computer Science and Technology.

[7]  S. Sathiya Keerthi,et al.  Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[8]  E. GaffneyJ.,et al.  Software Function, Source Lines of Code, and Development Effort Prediction , 1983 .

[9]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[10]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[11]  Ayse Basar Bener,et al.  Feature weighting heuristics for analogy-based effort estimation models , 2009, Expert Syst. Appl..

[12]  Allen S. Parrish,et al.  An Empirical Study Using Task Assignment Patterns to Improve the Accuracy of Software Effort Estimation , 2001, IEEE Trans. Software Eng..

[13]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[14]  Jean-Marc Desharnais,et al.  A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models , 1997, J. Syst. Softw..

[15]  Taghi M. Khoshgoftaar,et al.  Application of neural networks to software quality modeling of a very large telecommunications system , 1997, IEEE Trans. Neural Networks.

[16]  Arvinder Kaur,et al.  Application of machine learning methods for software effort prediction , 2010, SOEN.

[17]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[18]  G. R. Finnie,et al.  AI tools for software development effort estimation , 1996, Proceedings 1996 International Conference Software Engineering: Education and Practice.

[19]  Adriano Lorena Inácio de Oliveira,et al.  Estimation of software project effort with support vector regression , 2006, Neurocomputing.