Mining software code repositories and bug databases using survival analysis models

Code repositories and bug databases contain valuable information about the process of software development. Typical studies correlate code properties with the number of faults in a software module to find error-prone modules. However, many studies do not regard the occurrence of faults over time, although the time information can be retrieved from bug databases. In order to overcome this problem, we suggest the application of survival analysis models, which are used in biostatistics and can handle time-dependent data. Because a large amount of raw data has to be evaluated statistically, we further discuss the automated retrieval and pre-processing of raw data from code repositories and bug databases.

[1]  Q. P. Hu,et al.  Early Software Reliability Prediction with ANN Models , 2006, 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).

[2]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[3]  Axel Gandy,et al.  A non‐parametric approach to software reliability , 2004 .

[4]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[5]  D.,et al.  Regression Models and Life-Tables , 2022 .

[6]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[7]  R. Prentice,et al.  Commentary on Andersen and Gill's "Cox's Regression Model for Counting Processes: A Large Sample Study" , 1982 .

[8]  John D. Musa,et al.  Software reliability - measurement, prediction, application , 1987, McGraw-Hill series in software engineering and technology.