Experience from replicating empirical studies on prediction models

When conducting empirical studies, replications are important contributors to investigating the generality of the studies. By replicating a study in another context, we investigate what impact the specific environment has, related to the effect of the studied object. In this paper, we define different levels of replication to characterise the similarities and differences between an original study and a replication, with particular focus on prediction models for the identification of fault-prone software components. Further, we derive a set of issues and concerns which are important in order to enable replication of an empirical study and to enable practitioners to use the results. To illustrate the importance of the issues raised, a replication case study is presented in the domain of prediction models for fault-prone software components. It is concluded that the results are very divergent, depending on how different parameters are chosen, which demonstrates the need for well-documented empirical studies to enable replication and use.

[1]  Keith Phalp,et al.  Replicating the CREWS Use Case Authoring Guidelines Experiment , 2000, Empirical Software Engineering.

[2]  Barbara A. Kitchenham The Problem with Function Points , 1997, IEEE Software.

[3]  Taghi M. Khoshgoftaar,et al.  Assessing uncertain predictions of software quality , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[4]  Taghi M. Khoshgoftaar,et al.  Detection of fault-prone software modules during a spiral life cycle , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[5]  D. Ross Jeffery,et al.  Using public domain metrics to estimate software development effort , 2001, Proceedings Seventh International Software Metrics Symposium.

[6]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[7]  Ming Zhao,et al.  A comparison between software design and code metrics for the prediction of software fault content , 1998, Inf. Softw. Technol..

[8]  Taghi M. Khoshgoftaar,et al.  Early Quality Prediction: A Case Study in Telecommunications , 1996, IEEE Softw..

[9]  Taghi M. Khoshgoftaar,et al.  Detection of software modules with high debug code churn in a very large legacy system , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[10]  Colin Robson,et al.  Real World Research: A Resource for Social Scientists and Practitioner-Researchers , 1993 .

[11]  Dieter Hogrefe,et al.  SDL : formal object-oriented language for communicating systems , 1997 .

[12]  Per Runeson,et al.  Are the Perspectives Really Different? – Further Experimentation on Scenario-Based Reading of Requirements , 2000, Empirical Software Engineering.

[13]  Taghi M. Khoshgoftaar,et al.  Are the principal components of software complexity data stable across software products? , 1994, Proceedings of 1994 IEEE 2nd International Software Metrics Symposium.

[14]  Taghi M. Khoshgoftaar,et al.  Classification of Fault-Prone Software Modules: Prior Probabilities, Costs, and Model Evaluation , 1998, Empirical Software Engineering.

[15]  Taghi M. Khoshgoftaar,et al.  A neural network approach for early detection of program modules having high risk in the maintenance phase , 1995, J. Syst. Softw..

[16]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[17]  Norman F. Schneidewind Software metrics model for integrating quality control and prediction , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[18]  Anders Wesslén,et al.  A Replicated Empirical Study of the Impact of the Methods in the PSP on Individual Engineers , 2000, Empirical Software Engineering.

[19]  Sandro Morasca,et al.  On the application of measurement theory in software engineering , 2004, Empirical Software Engineering.

[20]  Claes Wohlin,et al.  A Classification Scheme for Studies on Fault-Prone Components , 2001, PROFES.

[21]  Ming Zhao,et al.  Application of multivariate analysis for software fault prediction , 1998, Software Quality Journal.

[22]  Claes Wohlin,et al.  Identification of green, yellow and red legacy components , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[23]  Taghi M. Khoshgoftaar,et al.  Classification tree models of software quality over multiple releases , 1999, Proceedings 10th International Symposium on Software Reliability Engineering (Cat. No.PR00443).

[24]  Sam Kash Kachigan Multivariate statistical analysis: A conceptual introduction , 1982 .

[25]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[26]  Victor R. Basili,et al.  Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components , 1993, IEEE Trans. Software Eng..

[27]  Magne Jørgensen,et al.  Experience With the Accuracy of Software Maintenance Task Effort Prediction Models , 1995, IEEE Trans. Software Eng..