Predicting project delivery rates using the Naive-Bayes classifier

The importance of accurate estimation of software development effort is well recognized in software engineering. In recent years, machine learning approaches have been studied as possible alternatives to more traditional software cost estimation methods. The objective of this paper is to investigate the utility of the machine learning algorithm known as the Naive-Bayes classifier for estimating software project effort. We present empirical experiments with the Benchmark 6 data set from the International Software Benchmarking Standards Group to estimate project delivery rates and compare the performance of the Naive-Bayes approach to two other machine learning methods--model trees and neural networks. A project delivery rate is defined as the number of effort hours per function point. The approach described is general and can be used to analyse not only software development data but also data on software maintenance and other types of software engineering. The paper demonstrates that the Naive-Bayes classifier has a potential to be used as an alternative machine learning tool for software development effort estimation.

[1]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[2]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  Lawrence H. Putnam,et al.  A General Empirical Solution to the Macro Software Sizing and Estimating Problem , 1978, IEEE Transactions on Software Engineering.

[5]  Richard E. Neapolitan,et al.  Probabilistic reasoning in expert systems - theory and algorithms , 2012 .

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[8]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[10]  José Manuel Gutiérrez,et al.  Expert Systems and Probabiistic Network Models , 1996 .

[11]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[12]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[13]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[14]  Amrit L. Goel,et al.  Empirical Data Modeling in Software Engineering Using Radical Basis Functions , 2000, IEEE Trans. Software Eng..

[15]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[16]  Magne Jørgensen,et al.  Experience With the Accuracy of Software Maintenance Task Effort Prediction Models , 1995, IEEE Trans. Software Eng..

[17]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[18]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[19]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[20]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[21]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[22]  Enrique F. Castillo,et al.  Expert Systems and Probabilistic Network Models , 1996, Monographs in Computer Science.

[23]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[25]  Lionel C. Briand,et al.  An assessment and comparison of common software cost estimation modeling techniques , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[26]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.