Software evolution and time series volatility: an empirical exploration

The paper presents the first empirical study to examine econometric time series volatility modeling in the software evolution context. The econometric volatility concept is related to the conditional variance of a time series rather than the conditional mean targeted in conventional regression analysis. The software evolution context is motivated by relating these variance characteristics to the proximity of operating system releases, the theoretical hypothesis being that volatile characteristics increase nearby new milestone releases. The empirical experiment is done with a case study of FreeBSD. The analysis is carried out with 12 time series related to bug tracking, development activity, and communication. A historical period from 1995 to 2011 is covered under a daily sampling frequency. According to the results the time series dataset contains visible volatility characteristics, but these cannot be explained by the time windows around the six observed major FreeBSD releases. The paper consequently contributes to the software evolution research field with new methodological ideas, as well as with both positive and negative empirical results.

[1]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[2]  A. Mackinlay,et al.  Event Studies in Economics and Finance , 1997 .

[3]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[4]  Chris F. Kemerer,et al.  On the uniformity of software evolution patterns , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[5]  Daniel M. Germán,et al.  On the prediction of the evolution of libre software projects , 2007, 2007 IEEE International Conference on Software Maintenance.

[6]  Emilia Mendes,et al.  Investigating the use of duration-based moving windows to improve software effort prediction: A replicated study , 2014, Inf. Softw. Technol..

[7]  David Veredas,et al.  Temporal Aggregation of Univariate and Multivariate Time Series Models: A Survey , 2008 .

[8]  Eric R. Ziegel,et al.  Analysis of Financial Time Series , 2002, Technometrics.

[9]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[10]  Jesús M. González-Barahona,et al.  Intensive metrics for the study of the evolution of open source projects: Case studies from Apache Software Foundation projects , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[11]  G. V. Henderson,et al.  Problems and Solutions in Conducting Event Studies , 1990 .

[12]  Dewayne E. Perry,et al.  On evidence supporting the FEAST hypothesis and the laws of software evolution , 1998, Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262).

[13]  Elaine J. Weyuker,et al.  Does measuring code change improve fault prediction? , 2011, Promise '11.

[14]  James H. Stock,et al.  Measuring Business Cycle Time , 1987, Journal of Political Economy.

[15]  P. Phillips,et al.  Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? , 1992 .

[16]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[17]  Kannan Mohan,et al.  Change management patterns in software product lines , 2006, CACM.

[18]  Miguel Goulão,et al.  Software Evolution Prediction Using Seasonal Time Analysis: A Comparative Study , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[19]  S. Ross Information and Volatility: The No-Arbitrage Martingale Approach to Timing and Resolution Irrelevancy , 1989 .

[20]  Diomidis Spinellis A tale of four kernels , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[21]  Ahmed E. Hassan,et al.  Studying the impact of social interactions on software quality , 2012, Empirical Software Engineering.

[22]  Magne Jørgensen,et al.  Interpretation problems related to the use of regression models to decide on economy of scale in software development , 2012, J. Syst. Softw..

[23]  George Kuk,et al.  Strategic Interaction and Knowledge Sharing in the KDE Developer Mailing List , 2006, Manag. Sci..

[24]  Bernhard Pfaff,et al.  Analysis of Integrated and Cointegrated Time Series with R , 2005 .

[25]  Andrew Meneely,et al.  Interactive churn metrics , 2012, ACM SIGSOFT Softw. Eng. Notes.

[26]  Anil K. Bera,et al.  ARCH Models: Properties, Estimation and Testing , 1993 .

[27]  Sankarshan Acharya,et al.  Value of Latent Information: Alternative Event Study Methods , 1993 .

[28]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[29]  Niels Jørgensen,et al.  Developer autonomy in the FreeBSD open source project , 2007 .

[30]  Ken-ichi Matsumoto,et al.  The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[31]  Dirk Eddelbuettel,et al.  Analysis of Integrated and Cointegrated Time Series with R (2nd Edition) , 2009 .

[32]  James M. Bieman,et al.  The evolution of FreeBSD and linux , 2006, ISESE '06.

[33]  T. Bollerslev,et al.  Generalized autoregressive conditional heteroskedasticity , 1986 .

[34]  Walid Maalej,et al.  How do open source communities blog? , 2012, Empirical Software Engineering.

[35]  Daniel E. O'Leary,et al.  Event Study Methodologies in Information Systems Research , 2011, Int. J. Account. Inf. Syst..

[36]  Roger S. Debreceny,et al.  Data Mining of Electronic Mail and Auditing: A Research Agenda , 2011, J. Inf. Syst..

[37]  R. Chou,et al.  ARCH modeling in finance: A review of the theory and empirical evidence , 1992 .

[38]  Abagail McWilliams,et al.  Event Studies In Management Research: Theoretical And Empirical Issues , 1997 .

[39]  H. Iemoto Modelling the persistence of conditional variances , 1986 .