Machine Learning Approaches to Estimating Software

Accurate estimation of software development effort is critical in software engineering. Underestimates lead to time pressures that may compromise full functional development and thorough testing of software. In contrast, overestimates can re- sult in noncompetitive contract bids and/or over allocation of development resources and personnel. As a result, many models for estimating software development effort have been proposed. This article describes two methods of machine learning, which we use to build estimators of software development effort from historical data. Our experiments indicate that these techniques are competitive with traditional estimators on one dataset, but also illustrate that these methods are sensitive to the data on which they are trained. This cautionary note applies to any model-construction strategy that relies on historical data. All such models for software effort estimation should be evaluated by exploring model sensitivity on a variety of historical data. Index Terms-Software development effort, machine learning, decision trees, regression trees, and neural networks.

[1]  Lee W. Johnson,et al.  Numerical Analysis , 1977 .

[2]  Lawrence H. Putnam,et al.  A General Empirical Solution to the Macro Software Sizing and Estimating Problem , 1978, IEEE Transactions on Software Engineering.

[3]  Siba N. Mohanty,et al.  Software cost estimation: Present and future , 1981, Softw. Pract. Exp..

[4]  Adam A. Porter,et al.  Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis , 1988, IEEE Trans. Software Eng..

[5]  Michael J. Prietula,et al.  Case-Based Reasoning in Software Effort estimation , 1990, International Conference on Interaction Sciences.

[6]  Walt Scacchi Understanding Software Productivity: towards a Knowledge-Based Approach , 1991, Int. J. Softw. Eng. Knowl. Eng..

[7]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[8]  Adam A. Porter,et al.  Empirically guided software development using metric-based classification trees , 1990, IEEE Software.

[9]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[10]  U. Fayyad On the induction of decision trees for multiple concept learning , 1991 .

[11]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[12]  Douglas H. Fisher,et al.  Overcoming process delays with decision tree induction , 1994, IEEE Expert.

[13]  Adam A. Porter,et al.  Evaluating techniques for generating metric-based classification trees , 1990, J. Syst. Softw..

[14]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[15]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[16]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[17]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[18]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .