Lessons Learned and Results from Applying Data-Driven Cost Estimation to Industrial Data Sets

The increasing availability of cost-relevant data in industry allows companies to apply data-intensive estimation methods. However, available data are often inconsistent, invalid, or incomplete, so that most of the existing data-intensive estimation methods cannot be applied. Only few estimation methods can deal with imperfect data to a certain extent (e.g., optimized set reduction, OSR). Results from evaluating these methods in practical environments are rare. This article describes a case study on the application of OSR at Toshiba information systems (Japan) corporation. An important result of the case study is that estimation accuracy significantly varies with the data sets used and the way of preprocessing these data. The study supports current results in the area of quantitative cost estimation and clearly illustrates typical problems. Experiences, lessons learned, and recommendations with respect to data preprocessing and data-intensive cost estimation in general are presented.

[1]  Ware Myers,et al.  Measures for Excellence: Reliable Software on Time, Within Budget , 1991 .

[2]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[3]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[4]  Barbara A. Kitchenham,et al.  Empirical studies of assumptions that underlie software cost-estimation models , 1992, Inf. Softw. Technol..

[5]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[6]  Carolyn Mair,et al.  The consistency of empirical comparisons of regression and analogy-based software project cost prediction , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[7]  Stefan Biffl,et al.  Optimal project feature weights in analogy-based cost estimation: improvement and limitations , 2006, IEEE Transactions on Software Engineering.

[8]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[9]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[10]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[11]  Lionel C. Briand,et al.  Explaining the cost of European space and military projects , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[12]  Kenji Yokoyama,et al.  Development of a hybrid cost estimation model in an iterative manner , 2006, ICSE.

[13]  Carolyn Mair,et al.  An analysis of data sets used to train and validate cost prediction systems , 2005, ACM SIGSOFT Softw. Eng. Notes.

[14]  Gary D. Boetticher,et al.  An Assessment of Metric Contribution in the Construction of a Neural Network-Based Effort Estimator , 2022 .

[15]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[18]  F. JAVIER. CRESPO,et al.  On the Use of Fuzzy Regression in Parametric Software Estimation Models: Integrating Imprecision in COCOMO Cost Drivers , 2003 .

[19]  Isabella Wieczorek,et al.  Resource Estimation in Software Engineering , 2002 .

[20]  Isabella Wieczorek,et al.  Applying Benchmarking to Learn from Best Practices , 2000, PROFES.

[21]  Khaled El Emam,et al.  Fraunhofer Institute for Experimental Software Engineering , 1997, Softw. Process. Improv. Pract..

[22]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[23]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[24]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[25]  Qinbao Song,et al.  Dealing with missing software project data , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[26]  H. D. Rombach,et al.  THE EXPERIENCE FACTORY , 1999 .

[27]  Michelle Cartwright,et al.  On Building Prediction Systems for Software Engineers , 2000, Empirical Software Engineering.

[28]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[29]  Isabella Wieczorek,et al.  How valuable is company-specific data compared to multi-company data for software cost estimation? , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.