Characteristics, potentials, and limitations of open-source Simulink projects for empirical research

Simulink is an example of a successful application of the paradigm of model-based development into industrial practice. Numerous companies create and maintain Simulink projects for modeling software-intensive embedded systems, aiming at early validation and automated code generation. However, Simulink projects are not as easily available as code-based ones, which profit from large publicly accessible open-source repositories, thus curbing empirical research. In this paper, we investigate a set of 1734 freely available Simulink models from 194 projects and analyze their suitability for empirical research. We analyze the projects considering (1) their development context, (2) their complexity in terms of size and organization within projects, and (3) their evolution over time. Our results show that there are both limitations and potentials for empirical research. On the one hand, some application domains dominate the development context, and there is a large number of models that can be considered toy examples of limited practical relevance. These often stem from an academic context, consist of only a few Simulink blocks, and are no longer (or have never been) under active development or maintenance. On the other hand, we found that a subset of the analyzed models is of considerable size and complexity. There are models comprising several thousands of blocks, some of them highly modularized by hierarchically organized Simulink subsystems. Likewise, some of the models expose an active maintenance span of several years, which indicates that they are used as primary development artifacts throughout a project’s lifecycle. According to a discussion of our results with a domain expert, many models can be considered mature enough for quality analysis purposes, and they expose characteristics that can be considered representative for industry-scale models. Thus, we are confident that a subset of the models is suitable for empirical research. More generally, using a publicly available model corpus or a dedicated subset enables researchers to replicate findings, publish subsequent studies, and use them for validation purposes. We publish our dataset for the sake of replicating our results and fostering future empirical research.

[1]  Takashi Tomita,et al.  A Scalable Monte-Carlo Test-Case Generation Tool for Large and Complex Simulink Models , 2019, 2019 IEEE/ACM 11th International Workshop on Modelling in Software Engineering (MiSE).

[2]  A. Raouf,et al.  Mutation Testing Based Evaluation of Formal Verification Tools , 2017, 2017 International Conference on Dependable Systems and Their Applications (DSA).

[3]  Claire Pagetti,et al.  CoCoSim, a code generation framework for control/command applications An overview of CoCoSim for multi-periodic discrete Simulink models , 2020 .

[4]  Shafiul Azam Chowdhury Understanding and Improving Cyber-Physical System Models and Development Tools , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[5]  Benoit Baudry,et al.  Automatic Model Generation Strategies for Model Transformation Testing , 2009, ICMT@TOOLS.

[6]  Koji Yatani,et al.  Sketching and Drawing in the Design of Open Source Software , 2010, 2010 IEEE Symposium on Visual Languages and Human-Centric Computing.

[7]  Brian Fitzgerald,et al.  The ABC of Software Engineering Research , 2018, ACM Trans. Softw. Eng. Methodol..

[8]  Peng Liang,et al.  How Do Open Source Communities Document Software Architecture: An Exploratory Survey , 2014, 2014 19th International Conference on Engineering of Complex Computer Systems.

[9]  Håkan Burden,et al.  Comparing and contrasting model-driven engineering at three large companies , 2014, ESEM '14.

[10]  Markus Völter,et al.  Model-Driven Software Development: Technology, Engineering, Management , 2006 .

[11]  Gerti Kappel,et al.  On the Usage of UML: Initial Results of Analyzing Open UML Models , 2014, Modellierung.

[12]  Andreas Vogelsang,et al.  Feature dependencies in automotive software systems: Extent, awareness, and refactoring , 2020, J. Syst. Softw..

[13]  Peter Liggesmeyer,et al.  Trends in Embedded Software Engineering , 2009, IEEE Software.

[14]  Marian Petre,et al.  UML in practice , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[15]  Markus Herrmannsdoerfer,et al.  Language Evolution in Practice: The History of GMF , 2009, SLE.

[16]  Muhammad Ali Babar,et al.  Reporting Empirical Research in Open Source Software: The State of Practice , 2009, OSS.

[17]  Taylor T. Johnson,et al.  A Curated Corpus of Simulink Models for Model-Based Empirical Studies , 2018, 2018 IEEE/ACM 4th International Workshop on Software Engineering for Smart Cyber-Physical Systems (SEsCPS).

[18]  Timothy Lethbridge,et al.  Modeling Practices in Open Source Software , 2013, OSS.

[19]  Bernhard Rumpe,et al.  Component and Connector Views in Practice: An Experience Report , 2017, 2017 ACM/IEEE 20th International Conference on Model Driven Engineering Languages and Systems (MODELS).

[20]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[21]  Udo Kelter,et al.  Statistical Analysis of Changes for Synthesizing Realistic Test Models , 2013, Software Engineering.

[22]  Klaus Krippendorff,et al.  Computing Krippendorff's Alpha-Reliability , 2011 .

[23]  Birger Møller-Pedersen,et al.  Synthesizing Software Models: Generating Train Station Models Automatically , 2011, SDL Forum.

[24]  Udo Kelter,et al.  A rule-based approach to the semantic lifting of model differences in the context of model versioning , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[25]  Yanja Dajsuren,et al.  On the design of an architecture framework and quality evaluation for automotive software systems , 2015 .

[26]  Taylor T. Johnson,et al.  Demo: SLEMI: Finding Simulink Compiler Bugs through Equivalence Modulo Input (EMI) , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[27]  Davide Spadini,et al.  PyDriller: Python framework for mining software repositories , 2018, ESEC/SIGSOFT FSE.

[28]  Gabriele Taentzer,et al.  Generating Large EMF Models Efficiently , 2020, FASE.

[29]  David S. Rosenblum,et al.  Editorial Journal-First Publication for the Software Engineering Community , 2015, ACM Trans. Softw. Eng. Methodol..

[30]  Lars Grunske,et al.  Supporting semi-automatic co-evolution of architecture and fault tree models , 2018, J. Syst. Softw..

[31]  Miguel A. Fernández,et al.  An empirical study of the state of the practice and acceptance of model-driven engineering in four industrial cases , 2012, Empirical Software Engineering.

[32]  D. Williamson,et al.  The box plot: a simple visual method to interpret data. , 1989, Annals of internal medicine.

[33]  Allison W. McCulloch,et al.  Developing and Using a Codebook for the Analysis of Interview Data: An Example from a Professional Development Research Project , 2011 .

[34]  Taylor T. Johnson,et al.  Automatically Finding Bugs in a Commercial Cyber-Physical System Development Tool Chain With SLforge , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[35]  Marco Torchiano,et al.  ACM SIGSOFT Empirical Standards , 2020, ArXiv.

[36]  Oszkár Semeráth,et al.  A Graph Solver for the Automated Generation of Consistent Domain-Specific Models , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[37]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[38]  Jonas Eckhardt,et al.  Views on Quality Requirements in Academia and Practice: Commonalities, Differences, and Context-Dependent Grey Areas , 2020, Inf. Softw. Technol..

[39]  Harald C. Gall,et al.  Analysing Software Repositories to Understand Software Evolution , 2008, Software Evolution.

[40]  Udo Kelter,et al.  History-based Model Repair Recommendations , 2021, ACM Trans. Softw. Eng. Methodol..

[41]  Gregorio Robles,et al.  Practices and Perceptions of UML Use in Open Source Projects , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[42]  Kamal Al-Haddad,et al.  Improved Restricted Control Set Model Predictive Control (iRCS-MPC) Based Maximum Power Point Tracking of Photovoltaic Module , 2019, IEEE Access.

[43]  Xiao Wu,et al.  Optimal Test Case Generation for Simulink Models Using Slicing , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C).

[44]  Lefteris Angelis,et al.  A framework for capturing, statistically modeling and analyzing the evolution of software models , 2016, J. Syst. Softw..

[45]  Harald Störrle On the impact of size to the understanding of UML diagrams , 2016, Software & Systems Modeling.

[46]  Vu Trieu Minh,et al.  Design and simulations of dual clutch transmission for hybrid electric vehicles , 2017 .

[47]  Arie Gurfinkel,et al.  Automated analysis of Stateflow models , 2017, LPAR.

[48]  Mark Rouncefield,et al.  Empirical assessment of MDE in industry , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[49]  Jordi Cabot,et al.  A Systematic Mapping Study of Software Development With GitHub , 2017, IEEE Access.

[50]  Jyotirmoy V. Deshmukh,et al.  Benchmarks for Model Transformations and Conformance Checking , 2014 .

[51]  Alexander Serebrenik,et al.  Tailoring complexity metrics for simulink models , 2016, ECSA Workshops.

[52]  Lars Grunske,et al.  MoFuzz: A Fuzzer Suite for Testing Model-Driven Software Engineering Tools , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[53]  Udo Kelter,et al.  Analysis and Prediction of Design Model Evolution Using Time Series , 2014, CAiSE Workshops.

[54]  Paul Ralph,et al.  Sampling in Software Engineering Research: A Critical Review and Guidelines , 2020, ArXiv.

[55]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2012, Springer Berlin Heidelberg.

[56]  Hridesh Rajan,et al.  Boa: Ultra-Large-Scale Software Repository and Source-Code Mining , 2015, ACM Trans. Softw. Eng. Methodol..

[57]  Benoît Combemale,et al.  The Relevance of Model-Driven Engineering Thirty Years from Now , 2014, MoDELS.

[58]  Alexander Serebrenik,et al.  Simulink models are also software: modularity assessment , 2013, QoSA '13.

[59]  Andreas Vogelsang,et al.  Strategies and Best Practices for Model-Based Systems Engineering Adoption in Embedded Systems Industry , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[60]  H. D. Rombach,et al.  The Goal Question Metric Approach , 1994 .

[61]  James R. Cordy,et al.  Towards a Taxonomy for Simulink Model Mutations , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops.

[62]  Gregorio Robles,et al.  The quest for open source projects that use UML: mining GitHub , 2016, MoDELS.

[63]  Justin M. Bradley,et al.  Investigating Controller Evolution and Divergence through Mining and Mutation* , 2020, 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS).

[64]  Timo Kehrer,et al.  On the use of product-line variants as experimental subjects for clone-and-own research: a case study , 2020, SPLC.

[65]  Richard F. Paige,et al.  On-the-Fly Translation and Execution of OCL-Like Queries on Simulink Models , 2019, 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS).

[66]  Jordi Cabot,et al.  Model-Driven Software Engineering in Practice , 2017, Synthesis Lectures on Software Engineering.

[67]  Mário André de Freitas Farias,et al.  A systematic mapping study on mining software repositories , 2016, SAC.

[68]  Udo Kelter,et al.  Synthesizing realistic test models , 2014, Computer Science - Research and Development.

[69]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[70]  T. Nguyen,et al.  Boa , 2015, The Art and Science of Analyzing Software Data.

[71]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[72]  A.E. Hassan,et al.  The road ahead for Mining Software Repositories , 2008, 2008 Frontiers of Software Maintenance.

[73]  Gidon Ernst,et al.  ARCH-COMP 2019 Category Report: Falsification , 2019, ARCH@CPSIoTWeek.

[74]  Alexander Boll,et al.  On the Replicability of Experimental Tool Evaluations in Model-Based Development - Lessons Learnt from a Systematic Literature Review Focusing on MATLAB/Simulink , 2020, ICSMM.