An Investigation of Nonparametric Data Mining Techniques for Acquisition Cost Estimating

The Department of Defense (DoD) cost estimating methodology traditionally focuses on parametric estimating using ordinary least squares (OLS) regression. Given the recent advances in acquisition data collection, however, senior leaders have expressed an interest in incorporating “data mining” and “more innovative analyses” within cost estimating. Thus, the goal of this research is to investigate nonparametric data mining techniques and their application to DoD cost estimating. Using a meta-analysis of 14 cost estimating studies containing 32 datasets that predominantly relate to commercial software development, the predictive accuracy of OLS regression is measured against three nonparametric data mining techniques. The meta-analysis results indicate that, on average, the nonparametric techniques outperform OLS regression for cost estimating. Follow-on data mining research that incorporates DoD-specific acquisition cost data is recommended to extend this article’s findings.

[1]  Cesar Queiroz,et al.  Developing Cost Estimation Models for Road Rehabilitation and Reconstruction , 2013 .

[2]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[3]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[4]  Richard F. Beltramini,et al.  Meta-Analysis: Quantitative Methods for Research Synthesis , 1987 .

[5]  Tariq Shehab,et al.  Cost Estimating Models for Utility Rehabilitation Projects: Neural Networks versus Regression , 2010 .

[6]  Ron S. Kenett,et al.  Statistics for Business and Economics , 1973 .

[7]  Sun-Jen Huang,et al.  The adjusted analogy-based software effort estimation based on similarity distances , 2007, J. Syst. Softw..

[8]  Gwenn W. Gröndal,et al.  Meta-analytic procedures for social research , 1993 .

[9]  Sung Hoon An,et al.  Comparison of construction cost estimating models based on regression analysis, neural networks, and case-based reasoning , 2004 .

[10]  David J. Hand,et al.  Data Mining: Statistics and More? , 1998 .

[11]  Jerome H. Friedman,et al.  DATA MINING AND STATISTICS: WHAT''S THE CONNECTION , 1997 .

[12]  R. W. Hess,et al.  Aircraft Airframe Cost Estimating Relationships: All Mission Types , 1987 .

[13]  Thomas W. Lamb Cost Analysis Reform: Where Do We Go From Here? A Delphi Study of Views of Leading Experts , 2016 .

[14]  Vadlamani Ravi,et al.  Software development cost estimation using wavelet neural networks , 2008, J. Syst. Softw..

[15]  G. Berg,et al.  An Application of Data Mining Algorithms for Shipbuilding Cost Estimation , 2011 .

[16]  Abbas Heiat,et al.  Comparison of artificial neural network and regression models for estimating software development effort , 2002, Inf. Softw. Technol..

[17]  Silvio Romero de Lemos Meira,et al.  Bagging Predictors for Estimation of Software Project Effort , 2007, 2007 International Joint Conference on Neural Networks.

[18]  W. T. Chan,et al.  Feature-based cost estimation for packaging products using neural networks , 1996 .

[19]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[20]  Jean-Marc Desharnais,et al.  A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models , 1997, J. Syst. Softw..

[21]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[22]  Li-Wei Chen,et al.  Integration of the grey relational analysis with genetic algorithm for software effort estimation , 2008, Eur. J. Oper. Res..

[23]  Heejun Park,et al.  An empirical validation of a neural network model for software effort estimation , 2008, Expert Syst. Appl..

[24]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[25]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[26]  Tom Fawcett,et al.  Data science for business , 2013 .

[27]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[28]  F. Wolf Meta-Analysis: Quantitative Methods for Research Synthesis , 1987 .

[29]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[30]  Yoonseok Shin,et al.  Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects , 2015, Comput. Intell. Neurosci..