A framework for capturing, statistically modeling and analyzing the evolution of software models

New framework for capturing, statistically modeling and simulating evolution of modelsEvolution formulated at 2 abstraction levels: low and high level changes between modelsEmpirical study of evolution of Java models using 3 kinds of time series modelsMixed ARMA-GARCH models were superior, but ARMA models performed well in practiceStatistical models were used to generate more realistic test models for MDE tools This paper presents a new methodological framework for capturing and statistically modeling the evolution of models in model-driven software development. The framework captures the changes between revisions of models in terms of both low-level (internal) and high-level (developer-visible) edit operations applied between revisions. In our approach, evolution is modeled statistically by using ARMA, GARCH and mixed ARMA-GARCH models. Forecasting and simulation aspects of these time series models are thoroughly assessed. The suitability of the framework is shown by applying it to a large set of design models of real Java systems. Our analysis shows that mixed ARMA-GARCH models are superior to ARMA models.A main motivation for, and application of, the resulting statistical models is to control the generation of realistic model histories which are intended to be used for testing model versioning tools. We present the architecture of the model generator and show how to generate random sequences from the statistical models which control the generation process. Further usages of the statistical models include various forecasting and simulation tasks.

[1]  Udo Kelter,et al.  Synthesizing realistic test models , 2014, Computer Science - Research and Development.

[2]  Ewan D. Tempero,et al.  Understanding the shape of Java software , 2006, OOPSLA '06.

[3]  David S. Matteson,et al.  Time-Series Models of Dynamic Volatility and Correlation , 2011, IEEE Signal Processing Magazine.

[4]  Jesús M. González-Barahona,et al.  Towards a Theoretical Model for Software Growth , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[5]  Richard F. Paige,et al.  Different models for model matching: An analysis of approaches to support model differencing , 2009, 2009 ICSE Workshop on Comparison and Versioning of Software Models.

[6]  Udo Kelter,et al.  Controlled Generation of Models with Defined Properties , 2012, Software Engineering.

[7]  Lars Grunske,et al.  An automated approach to forecasting QoS attributes based on linear and non-linear time series modeling , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[8]  Shiqing Ling,et al.  Self-weighted and local quasi-maximum likelihood estimators for ARMA-GARCH/IGARCH models , 2007 .

[9]  C. J. Ancker,et al.  The problem of the initial transient in digital computer simulation , 1976, WSC '76.

[10]  Jordi Cabot,et al.  MoDisco: A model driven reverse engineering framework , 2014, Inf. Softw. Technol..

[11]  James R. Wilson,et al.  A survey of research on the simulation startup problem , 1978 .

[12]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[13]  T. Bollerslev,et al.  Generalized autoregressive conditional heteroskedasticity , 1986 .

[14]  Averill M. Law,et al.  How to build valid and credible simulation models , 2008, 2008 Winter Simulation Conference.

[15]  Robert E. Shannon,et al.  Introduction to the art and science of simulation , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[16]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[17]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[18]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[19]  Eduardo Rossi,et al.  LECTURE NOTES ON GARCH MODELS , 2004 .

[20]  Udo Kelter,et al.  Understanding model evolution through semantically lifting model differences with SiLift , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[21]  Wayne Luk,et al.  Gaussian random number generators , 2007, CSUR.

[22]  Udo Kelter,et al.  Generating realistic test models for model processing tools , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[23]  Udo Kelter,et al.  Assessing the Quality of Model Differencing Engines , 2012, Softwaretechnik-Trends.

[24]  Markus Lumpe,et al.  Helix software evolution data set , 2010 .

[25]  R. Sakia The Box-Cox transformation technique: a review , 1992 .

[26]  Averill M. Law Statistical Analysis of Simulation Output Data , 1980 .

[27]  Oscar Nierstrasz,et al.  The Inevitable Stability of Software Change , 2007, 2007 IEEE International Conference on Software Maintenance.

[28]  Anu Maria,et al.  Introduction to modeling and simulation , 1997, WSC '97.

[29]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[30]  T. Bollerslev,et al.  ON THE CORRELATION STRUCTURE FOR THE GENERALIZED AUTOREGRESSIVE CONDITIONAL HETEROSKEDASTIC PROCESS , 1988 .

[31]  R. Cont Empirical properties of asset returns: stylized facts and statistical issues , 2001 .

[32]  Uzma Raja,et al.  Modeling software evolution defects: a time series approach , 2009 .

[33]  James R. Wilson,et al.  Evaluation of startup policies in simulation experiments , 1978 .

[34]  James R. Cordy,et al.  A Survey of Model Comparison Approaches and Applications , 2013, MODELSWARD.

[35]  Daniel M. Germán,et al.  On the Distribution of Source Code File Sizes , 2011, ICSOFT.

[36]  Tim Bollerslev,et al.  Chapter 49 Arch models , 1994 .

[37]  M. Marchesi,et al.  On the suitability of Yule process to stochastically model some properties of object-oriented systems , 2006 .

[38]  Keith W. Hipel,et al.  Simulation procedures for Box‐Jenkins Models , 1978 .

[39]  Fred L. Collopy,et al.  Error Measures for Generalizing About Forecasting Methods: Empirical Comparisons , 1992 .

[40]  Guojun Gan,et al.  Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability) , 2007 .

[41]  Averill M. Law,et al.  Feature Article - Statistical Analysis of Simulation Output Data , 1983, Oper. Res..

[42]  Averill M. Law,et al.  Statistical analysis of simulation output data: the practical state of the art , 2004, Proceedings of the 2004 Winter Simulation Conference, 2004..

[43]  Alexander Lindner,et al.  Stationarity, Mixing, Distributional Properties and Moments of GARCH(p, q)-Processes , 2009 .

[44]  Seong-Hee Kim,et al.  A review of advanced methods for simulation output analysis , 1994 .

[45]  Barbara Kitchenham,et al.  What's up with software metrics? - A preliminary mapping study , 2010, J. Syst. Softw..

[46]  G. Box,et al.  On a measure of lack of fit in time series models , 1978 .

[47]  Udo Kelter,et al.  Analysis and Prediction of Design Model Evolution Using Time Series , 2014, CAiSE Workshops.

[48]  Qing Wang,et al.  Time series analysis for bug number prediction , 2010, The 2nd International Conference on Software Engineering and Data Mining.

[49]  Z. Sun,et al.  Traffic predictability based on ARIMA/GARCH model , 2006, 2006 2nd Conference on Next Generation Internet Design and Engineering, 2006. NGI '06..

[50]  Giuliano Antoniol,et al.  Modeling clones evolution through time series , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[51]  A. I. McLeod,et al.  DIAGNOSTIC CHECKING ARMA TIME SERIES MODELS USING SQUARED‐RESIDUAL AUTOCORRELATIONS , 1983 .

[52]  Meir M. Lehman Programs, life cycles, and laws of software evolution , 1980 .

[53]  Gerti Kappel,et al.  An Introduction to Model Versioning , 2012, SFM.

[54]  Udo Kelter,et al.  Statistical Analysis of Changes for Synthesizing Realistic Test Models , 2013, Software Engineering.

[55]  J. Zakoian,et al.  Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes , 2004 .

[56]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[57]  Lars Grunske,et al.  An Approach to Forecasting QoS Attributes of Web Services Based on ARIMA and GARCH Models , 2012, 2012 IEEE 19th International Conference on Web Services.

[58]  Michele Marchesi,et al.  Power-Laws in a Large Object-Oriented Software System , 2007, IEEE Transactions on Software Engineering.

[59]  J. Armstrong,et al.  Evaluating Forecasting Methods , 2001 .

[60]  G. Altmann,et al.  Thesaurus of univariate discrete probability distributions , 1999 .

[61]  Roberto S. Mariano,et al.  Testing Forecast Accuracy , 2007 .

[62]  Hao Wu,et al.  Metamodel Instance Generation: A systematic literature review , 2012, ArXiv.

[63]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[64]  Murat Kulahci,et al.  Introduction to Time Series Analysis and Forecasting , 2008 .

[65]  A. Koehler,et al.  Exponential Smoothing Model Selection for Forecasting , 2006 .

[66]  Oscar Nierstrasz,et al.  On the Resilience of Classes to Change , 2008, Electron. Commun. Eur. Assoc. Softw. Sci. Technol..

[67]  H. Iemoto Modelling the persistence of conditional variances , 1986 .

[68]  Michael Stepp,et al.  The Yoix® scripting language: a different way of writing Java™ applications , 2007 .

[69]  Paul Newbold,et al.  Testing the equality of prediction mean squared errors , 1997 .

[70]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[71]  Israel Herraiz A statistical examination of the evolution and properties of libre software , 2009, 2009 IEEE International Conference on Software Maintenance.

[72]  Andrew F. Seila,et al.  Advanced output analysis for simulation , 1992, WSC '92.

[73]  Daniel J. Rosenkrantz,et al.  An empirical validation of object-oriented class complexity metrics and their ability to predict error-prone classes in highly iterative, or agile, software: a case study , 2008 .

[74]  Gabriele Taentzer,et al.  Henshin: advanced concepts and tools for in-place EMF model transformations , 2010, MODELS'10.

[75]  H. Akaike A new look at the statistical model identification , 1974 .

[76]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[77]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[78]  Udo Kelter,et al.  Adaptability of model comparison tools , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[79]  Christos Alexopoulos,et al.  Output Data Analysis , 2007 .

[80]  C. J. Ancker,et al.  Evaluation of commonly used rules for detecting “steady state” in computer simulation , 1978 .

[81]  Giuliano Antoniol,et al.  Trend Analysis and Issue Prediction in Large-Scale Open Source Systems , 2008, 2008 12th European Conference on Software Maintenance and Reengineering.

[82]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[83]  Jerry Banks Introduction to simulation , 1999, WSC '99.

[84]  Udo Kelter,et al.  A rule-based approach to the semantic lifting of model differences in the context of model versioning , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[85]  Ian McLeod Derivation of the Theoretical Autocovariance Function of Autoregressive–Moving Average Time Series , 1975 .