Software effort estimation using machine learning techniques

Estimation of desired effort is one of the most important activities in software project management. This paper presents an approach for estimation based upon machine learning techniques for non-quantitative data and is carried out in two phases. The first phase concentrates on selection of optimal feature set in high dimensional data, related to projects undertaken in past. A quantitative analysis using Rough Set Theory is performed for feature reduction. The second phase estimates the effort based on the optimal feature set obtained from first phase. The estimation is carried out differently by applying Naive Bayes Classifier and Artificial Neural Network techniques respectively. The feature reduction process in first phase considers public domain data (USP05). The performance of the proposed methods is evaluated and compared based on the parameters such as Mean Magnitude of Relative Error (MMRE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Correlation Coefficient. It is observed that Naive Bayes classifier achieved better results for estimation when compared with that by using Neural Network technique.

[1]  Z. Pawlak Rough set approach to knowledge-based decision support , 1997 .

[2]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[3]  Ricardo Massa Ferreira Lima,et al.  GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation , 2010, Inf. Softw. Technol..

[4]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[5]  Sanguthevar Rajasekaran,et al.  Neural networks, fuzzy logic, and genetic algorithms : synthesis and applications , 2003 .

[6]  Karen T. Lum,et al.  Selecting Best Practices for Effort Estimation , 2006, IEEE Transactions on Software Engineering.

[7]  B. Ahn,et al.  The integrated methodology of rough set theory and artificial neural network for business failure prediction , 2000 .

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[9]  Lawrence H. Putnam,et al.  A General Empirical Solution to the Macro Software Sizing and Estimating Problem , 1978, IEEE Transactions on Software Engineering.

[10]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[11]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[12]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[13]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[14]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[15]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[16]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[17]  Desire L. Massart,et al.  Rough sets theory , 1999 .

[18]  Günther Ruhe,et al.  Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+ , 2008, Empirical Software Engineering.

[19]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[20]  S. Rajashekaran,et al.  Neural Networks, Fuzzy Logic and Genetic Algorithms , 2004 .

[21]  Michael M. Richter,et al.  A flexible method for software effort estimation by analogy , 2007, Empirical Software Engineering.