Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-Starved Domains

Ideally, software engineering should be able to use machine learning to control or significantly decrease the costs associated with building software. In reality, there are very few examples of applying such applications early in the software life cycle. One reason for the scarcity of examples is the lack of empirical data in the software engineering discipline. This dilemma is quite evident when constructing models to predict project effort. This raises the question of “How to generate sufficient amounts of data when it is sparse?” One approach is to assess projects from a bottomup perspective. This approach uses estimates gathered from products in predicting project effort. This paper conducts a set of machine learning experiments with software cost estimation data from two separate organizations. These experiments explore the possibility of performing project estimating from a bottom-up perspective and characterize predictive potential within two different organizations. The results are statistically assessed and a process is proposed for applying the described techniques.

[1]  Gary D. Boetticher,et al.  An Assessment of Metric Contribution in the Construction of a Neural Network-Based Effort Estimator , 2022 .

[2]  Martin Shepperd,et al.  Experiences Using Case-Based Reasoning to Predict Software Project Effort , 2000 .

[3]  Jairus Hihn,et al.  Cost estimation of software intensive projects: a survey of current practices , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[4]  Lawrence H. Putnam,et al.  A General Empirical Solution to the Macro Software Sizing and Estimating Problem , 1978, IEEE Transactions on Software Engineering.

[5]  Derek Coleman,et al.  Introducing Objectcharts or how to use Statecharts in object-oriented design , 1992 .

[6]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[7]  Sunder Kekre,et al.  Software Effort Models for Early Estimation of Process Control Applications , 1992, IEEE Trans. Software Eng..

[8]  David Eichmann,et al.  A Neural Net-Based Approach to Software Metrics , 1992, SEKE.

[9]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[10]  Gavin R. Finnie,et al.  Estimating software development effort with connectionist models , 1997, Inf. Softw. Technol..

[11]  Claude Seidman,et al.  Data Mining with Microsoft SQL Server 2000 Technical Reference , 2001 .

[12]  David Ellison,et al.  Software cost estimation using an Albus perceptron (CMAC) , 1997, Inf. Softw. Technol..

[13]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[14]  Khaled El Emam,et al.  Software Cost Estimation with Incomplete Data , 2001, IEEE Trans. Software Eng..

[15]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[16]  Sarah Jane Delany,et al.  The Application of Case-Based Reasoning to Early Software Project Cost Estimation and Risk Assessment , 2000 .

[17]  Tim Menzies,et al.  Practical Machine Learning for Software Engineering and Knowledge Engineering , 2000 .

[18]  P. W. Garratt,et al.  A Neurofuzzy cost estimator , 1999 .

[19]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[20]  Michelle Cartwright,et al.  Predicting with Sparse Data , 2001, IEEE Trans. Software Eng..

[21]  Geoffrey E. Hinton,et al.  How neural networks learn from experience. , 1992, Scientific American.

[22]  David Eichmann,et al.  A Neural Network Paradigm for Characterizing Reusable Software , 1993 .

[23]  Norman E. Fenton,et al.  Software metrics: roadmap , 2000, ICSE '00.

[24]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[25]  Michael J. Prietula,et al.  Software-effort estimation with a case-based reasoner , 1996, J. Exp. Theor. Artif. Intell..

[26]  Ellis Horowitz,et al.  Cocomo ii model definition manual , 1998 .

[27]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[28]  Barry W. Boehm,et al.  Bayesian Analysis of Empirical Software Engineering Cost Models , 1999, IEEE Trans. Software Eng..

[29]  Adam A. Porter,et al.  Empirically guided software development using metric-based classification trees , 1990, IEEE Software.

[30]  Jean-Marc Desharnais,et al.  Estimating Software Development Effort with Case-Based Reasoning , 1997, ICCBR.

[31]  Michelle Cartwright,et al.  Predicting with sparse data , 2001, Proceedings Seventh International Software Metrics Symposium.

[32]  Tim Menzies,et al.  Machine Learning for Requirements Engineering , 2001 .

[33]  Barry W. Boehm,et al.  Cost models for future software life cycle processes: COCOMO 2.0 , 1995, Ann. Softw. Eng..

[34]  F. J. Heemstra,et al.  Software cost estimation , 1992, Inf. Softw. Technol..

[35]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[36]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[37]  Mark C. Paulk,et al.  Capability Maturity Model , 1991 .