The Emergence of Software Diversity in Maven Central

Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to release a new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a library can pick any version. In this work, we hypothesize that the immutability of Maven artifacts and the ability to choose any version naturally support the emergence of software diversity within Maven Central. We analyze 1,487,956 artifacts that represent all the versions of 73,653 libraries. We observe that more than 30% of libraries have multiple versions that are actively used by latest artifacts. In the case of popular libraries, more than 50% of their versions are used. We also observe that more than 17% of libraries have several versions that are significantly more used than the other versions. Our results indicate that the immutability of artifacts in Maven Central does support a sustained level of diversity among versions of libraries in the repository.

[1]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  Wenpu Xing,et al.  Weighted PageRank algorithm , 2004, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004..

[4]  Mark Stamp,et al.  Risks of monoculture , 2004, CACM.

[5]  Sebastiano Vigna,et al.  PageRank as a function of the damping factor , 2005, WWW '05.

[6]  Shinji Kusumoto,et al.  Ranking significance of software components based on use relations , 2003, IEEE Transactions on Software Engineering.

[7]  R. Frankham Genetics and extinction , 2005 .

[8]  Roberto Di Cosmo,et al.  Managing the Complexity of Large Free and Open Source Package-Based Software Distributions , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[9]  Lorenzo Strigini,et al.  Fault Tolerance via Diversity for Off-the-Shelf Products: A Study with SQL Database Servers , 2007, IEEE Transactions on Dependable and Secure Computing.

[10]  Martin Burger,et al.  Mining trends of library usage , 2009, IWPSE-Evol '09.

[11]  Fred B. Schneider,et al.  IT Monoculture Security Risks and Defenses , 2009, IEEE Secur. Priv..

[12]  Xavier Blanc,et al.  Mining Library Migration Graphs , 2012, 2012 19th Working Conference on Reverse Engineering.

[13]  Arie van Deursen,et al.  The Maven repository dataset of metrics, changes, and dependencies , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[14]  Gabriele Bavota,et al.  How the Apache community upgrades dependencies: an evolutionary study , 2014, Empirical Software Engineering.

[15]  Tom Mens,et al.  ECOS: Ecological studies of open source software ecosystems , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[16]  Iryna Gurevych,et al.  A broad-coverage collection of portable NLP components for building shareable analysis pipelines , 2014, OIAF4HLT@COLING.

[17]  Xavier Blanc,et al.  A study of library migrations in Java , 2014, J. Softw. Evol. Process..

[18]  Arie van Deursen,et al.  Semantic Versioning versus Breaking Changes: A Study of the Maven Repository , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[19]  Georgios Gousios,et al.  The bug catalog of the maven ecosystem , 2014, MSR 2014.

[20]  Alessandra Gorla,et al.  Automatic Workarounds: Exploiting the Intrinsic Redundancy of Web Applications , 2015, TSEM.

[21]  Katsuro Inoue,et al.  Trusting a library: A study of the latency to adopt the latest Maven release , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[22]  Jens Dietrich,et al.  How Java APIs break - An empirical study , 2015, Inf. Softw. Technol..

[23]  Benoit Baudry,et al.  The Multiple Facets of Software Diversity , 2014, ACM Comput. Surv..

[24]  Tom Mens,et al.  The Ecology of Software Ecosystems , 2015, Computer.

[25]  Tom Mens,et al.  An empirical comparison of dependency network evolution in seven software packaging ecosystems , 2017, Empirical Software Engineering.

[26]  Hirohiko Suwa,et al.  An Analysis of Library Rollbacks: A Case Study of Java Libraries , 2017, 2017 24th Asia-Pacific Software Engineering Conference Workshops (APSECW).

[27]  Rabe Abdalkareem,et al.  Why do developers use trivial packages? an empirical case study on npm , 2017, ESEC/SIGSOFT FSE.

[28]  Katsuro Inoue,et al.  Do developers update their library dependencies? , 2017, Empirical Software Engineering.

[29]  Steven Raemaekers,et al.  Semantic versioning and impact of breaking changes in the Maven repository , 2017, J. Syst. Softw..

[30]  Georgios Gousios,et al.  Structure and Evolution of Package Dependency Networks , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[31]  Alberto Bacchelli,et al.  On the reaction to deprecation of clients of 4 + 1 popular Java APIs and the JDK , 2018, Empirical Software Engineering.

[32]  Katsuro Inoue,et al.  An exploratory study on library aging by monitoring client usage in a software ecosystem , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[33]  Takashi Ishio,et al.  Towards Smoother Library Migrations: A Look at Vulnerable Dependency Migrations at Function Level for npm JavaScript Packages , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[34]  Ying Wang,et al.  Do the dependency conflicts in my project matter? , 2018, ESEC/SIGSOFT FSE.

[35]  Fabio Massacci,et al.  Vulnerable open source dependencies: counting those that matter , 2018, ESEM.

[36]  Michael Ferry,et al.  ggtern: Ternary Diagrams Using ggplot2 , 2018 .

[37]  Tom Mens,et al.  On the Diversity of Software Package Popularity Metrics: An Empirical Study of npm , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[38]  Olivier Barais,et al.  The Maven Dependency Graph: A Temporal Graph-Based Representation of Maven Central , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[39]  Lukas Linsbauer,et al.  Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases , 2019, SWQD.