Comparing Software Prediction Techniques Using Simulation

The need for accurate software prediction systems increases as software becomes much larger and more complex. We believe that the underlying characteristics: size, number of features, type of distribution, etc., of the data set influence the choice of the prediction system to be used. For this reason, we would like to control the characteristics of such data sets in order to systematically explore the relationship between accuracy, choice of prediction system, and data set characteristic. It would also be useful to have a large validation data set. Our solution is to simulate data allowing both control and the possibility of large (1000) validation cases. The authors compare four prediction techniques: regression, rule induction, nearest neighbor (a form of case-based reasoning), and neural nets. The results suggest that there are significant differences depending upon the characteristics of the data set. Consequently, researchers should consider prediction context when evaluating competing prediction systems. We observed that the more "messy" the data and the more complex the relationship with the dependent variable, the more variability in the results. In the more complex cases, we observed significantly different results depending upon the particular training set that has been sampled from the underlying data set. However, our most important result is that it is more fruitful to ask which is the best prediction system in a particular context rather than which is the "best" prediction system.

[1]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[2]  Hans van Vliet,et al.  Predicting maintenance effort with function points , 1997, 1997 Proceedings International Conference on Software Maintenance.

[3]  Mauricio Amaral de Almeida,et al.  An investigation on the use of machine learned models for estimating correction costs , 1998, Proceedings of the 20th International Conference on Software Engineering.

[4]  Ronald Gulezian Reformulating and calibrating COCOMO , 1991, J. Syst. Softw..

[5]  Barbara Kitchenham,et al.  The MERMAID Approach to software cost estimation , 1990 .

[6]  David Leake,et al.  Case-Based Reasoning: Experiences, Lessons and Future Directions , 1996 .

[7]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[8]  Stephen G. MacDonell Metrics for database systems: an empirical study , 1997, Proceedings Fourth International Software Metrics Symposium.

[9]  James Miller Can results from software engineering experiments be safely combined? , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[10]  Barbara A. Kitchenham,et al.  An investigation of analysis techniques for software datasets , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[11]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[12]  Ingunn Myrtveit,et al.  A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models , 1999, IEEE Trans. Software Eng..

[13]  John Jenkins,et al.  Cost-estimation by analogy as a good management practice , 1988 .

[14]  Barbara A. Kitchenham,et al.  Empirical studies of assumptions that underlie software cost-estimation models , 1992, Inf. Softw. Technol..

[15]  Adam A. Porter,et al.  Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis , 1988, IEEE Trans. Software Eng..

[16]  J. Neter,et al.  Applied linear statistical models : regression, analysis of variance, and experimental designs , 1974 .

[17]  J. Ross Quinlan,et al.  Learning decision tree classifiers , 1996, CSUR.

[18]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[19]  Michael J. Prietula,et al.  Examining the Feasibility of a Case-Based Reasoning Model for Software Effort Estimation , 1992, MIS Q..

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[22]  Gavin R. Finnie,et al.  Estimating software development effort with connectionist models , 1997, Inf. Softw. Technol..

[23]  Marija J. Norusis,et al.  SPSS for Windows Base System User''s Guide , 1992 .

[24]  Michael J. Prietula,et al.  Case-Based Reasoning in Software Effort estimation , 1990, International Conference on Interaction Sciences.

[25]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[26]  Lionel C. Briand,et al.  A replicated assessment and comparison of common software cost modeling techniques , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[27]  Barbara Kitchenham,et al.  Software cost models , 1984 .