On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects

The growing popularity of machine learning (ML) applications has led to the introduction of software engineering tools such as Data Versioning Control (DVC), MLFlow and Pachyderm that enable versioning ML data, models, pipelines and model evaluation metrics. Since these versioned ML artifacts need to be synchronized not only with each other, but also with the source and test code of the software applications into which the models are integrated, prior findings on co-evolution and coupling between software artifacts might need to be revisited. Hence, in order to understand the degree of coupling between ML-related and other software artifacts, as well as the adoption of ML versioning features, this paper empirically studies the usage of DVC in 391 Github projects, 25 of which in detail. Our results show that more than half of the DVC files in a project are changed at least once every one-tenth of the project’s lifetime. Furthermore, we observe a tight coupling between DVC files and other artifacts, with 1/4 pull requests changing source code and 1/2 pull requests changing tests requiring a change to DVC files. As additional evidence of the observed complexity associated with adopting ML-related software engineering tools like DVC, an average of 78% of the studied projects showed a non-constant trend in pipeline complexity.

[1]  Darrel C. Ince,et al.  A critique of three metrics , 1994, J. Syst. Softw..

[2]  Meir M. Lehman,et al.  A Model of Large Program Development , 1976, IBM Syst. J..

[3]  Michael Witt,et al.  Curious Containers: A framework for computational reproducibility in life sciences with support for Deep Learning applications , 2020, Future Gener. Comput. Syst..

[4]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[5]  Meir M. Lehman,et al.  On understanding laws, evolution, and conservation in the large-program life cycle , 1984, J. Syst. Softw..

[6]  Sven Apel,et al.  Coevolution of variability models and related software artifacts , 2016, Empirical Software Engineering.

[7]  Harald C. Gall,et al.  Software Engineering for Machine Learning: A Case Study , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[8]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[9]  T Hariprasad,et al.  Software complexity analysis using halstead metrics , 2017, 2017 International Conference on Trends in Electronics and Informatics (ICEI).

[10]  Sergio A. Alvarez,et al.  Chi-squared computation for association rules: preliminary results , 2003 .

[11]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[12]  Michele Lanza,et al.  On the Relationship Between Change Coupling and Software Defects , 2009, 2009 16th Working Conference on Reverse Engineering.

[13]  Shane McIntosh,et al.  The evolution of ANT build systems , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[14]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[15]  Shane McIntosh,et al.  Mining Co-change Information to Understand When Build Changes Are Necessary , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[16]  David Lo,et al.  Cross-project build co-change prediction , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[17]  Andreas Zeller,et al.  The impact of tangled code changes , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[18]  Stuart I. Feldman,et al.  Make — a program for maintaining computer programs , 1979, Softw. Pract. Exp..

[19]  PS Janardhanan,et al.  Project repositories for machine learning with TensorFlow , 2020 .

[20]  Shane McIntosh,et al.  An empirical study of build maintenance effort , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[21]  Dewayne E. Perry,et al.  Metrics and laws of software evolution-the nineties view , 1997, Proceedings Fourth International Software Metrics Symposium.

[22]  Vít Novotný,et al.  Text classification with word embedding regularization and soft similarity measure , 2020, ArXiv.

[23]  Alessandro Orso,et al.  Understanding myths and realities of test-suite evolution , 2012, SIGSOFT FSE.

[24]  Michael W. Godfrey,et al.  Evolution in open source software: a case study , 2000, Proceedings 2000 International Conference on Software Maintenance.

[25]  Arie van Deursen,et al.  Mining Software Repositories to Study Co-Evolution of Production & Test Code , 2008, 2008 1st International Conference on Software Testing, Verification, and Validation.

[26]  Jez Humble,et al.  Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation , 2010 .

[27]  Bram Adams,et al.  Co-evolution of Infrastructure and Source Code - An Empirical Study , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.