A longitudinal analysis of bloated Java dependencies

We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we study across a total of 31,515 versions of Maven dependency trees. Bloated dependencies steadily increase over time, and 89.2% of the direct dependencies that are bloated remain bloated in all subsequent versions of the studied projects. This empirical evidence suggests that developers can safely remove a bloated dependency. We further report novel insights regarding the unnecessary maintenance efforts induced by bloat. We find that 22% of dependency updates performed by developers are made on bloated dependencies, and that Dependabot suggests a similar ratio of updates on bloated dependencies.

[1]  Hashim Sharif,et al.  Trimmer: Application Specialization for Code Debloating , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[2]  B. Baudry,et al.  Trace-based Debloat for Java Bytecode , 2020, ArXiv.

[3]  Chenxiong Qian,et al.  RAZOR: A Framework for Post-deployment Software Debloating , 2019, USENIX Security Symposium.

[4]  Miryung Kim,et al.  JShrink: in-depth investigation into debloating modern Java applications , 2020, ESEC/SIGSOFT FSE.

[5]  Zhendong Su,et al.  Perses: Syntax-Guided Program Reduction , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[6]  David Sands,et al.  Data Minimisation: A Language-Based Approach , 2017, SEC.

[7]  Lok-Kwong Yan,et al.  Debloating Software through Piece-Wise Compilation and Loading , 2018, USENIX Security Symposium.

[8]  Somesh Jha,et al.  Cimplifier: automatically debloating containers , 2017, ESEC/SIGSOFT FSE.

[9]  Gerard J. Holzmann,et al.  Code Inflation , 2015, IEEE Softw..

[10]  Michael Eichberg,et al.  Systematic evaluation of the unsoundness of call graph construction algorithms for Java , 2018, ISSTA/ECOOP Workshops.

[11]  Xavier Blanc,et al.  A study of library migrations in Java , 2014, J. Softw. Evol. Process..

[12]  Alexander Serebrenik,et al.  Challenges for Static Analysis of Java Reflection - Literature Review and Empirical Study , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[13]  Darko Marinov,et al.  A Large-Scale Study of Test Coverage Evolution , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[14]  Michalis Polychronakis,et al.  Configuration-Driven Software Debloating , 2019, EuroSec@EuroSys.

[15]  Russ Cox,et al.  Surviving software dependencies , 2019, Commun. ACM.

[16]  Audris Mockus,et al.  Detecting and Characterizing Bots that Commit Code , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[17]  Igor Steinmacher,et al.  The Inconvenient Side of Software Bots on Pull Requests , 2020, ICSE.

[18]  Georgios Portokalidis,et al.  Large-scale Debloating of Binary Shared Libraries , 2020, Digital Threats: Research and Practice.

[19]  Katsuro Inoue,et al.  Do developers update their library dependencies? , 2017, Empirical Software Engineering.

[20]  Pierre Laperdrix,et al.  Less is More: Quantifying the Security Benefits of Debloating Web Applications , 2019, USENIX Security Symposium.

[21]  Matthew Hague,et al.  CSS Minification via Constraint Solving , 2019, TOPL.

[22]  Martin Monperrus,et al.  Styler: Learning Formatting Conventions to Repair Checkstyle Errors , 2019, ArXiv.

[23]  Ondrej Lhoták,et al.  In defense of soundiness , 2015, Commun. ACM.

[24]  Peng Liu,et al.  JRed: Program Customization and Bloatware Mitigation Based on Static Analysis , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[25]  Alessandro Orso,et al.  Program Debloating via Stochastic Optimization , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER).

[26]  Georgios Gousios,et al.  Fine-Grained Network Analysis for Modern Software Ecosystems , 2020, ACM Trans. Internet Techn..

[27]  Tomas Gustavsson,et al.  Managing the Open Source Dependency , 2020, Computer.

[28]  Francisco Gomes de Oliveira Neto,et al.  An empirical study of bots in software development: characteristics and challenges from a practitioner’s perspective , 2020, ESEC/SIGSOFT FSE.

[29]  Benoit Baudry,et al.  Duets: A Dataset of Reproducible Pairs of Java Library-Clients , 2021, 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR).

[30]  Chenxiong Qian,et al.  Slimium: Debloating the Chromium Browser with Feature Subsetting , 2020, CCS.

[31]  Benoit Baudry,et al.  A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem , 2021, Empir. Softw. Eng..

[32]  Miryung Kim,et al.  WebJShrink: a web service for debloating Java bytecode , 2020, ESEC/SIGSOFT FSE.

[33]  Alexandre Bergel,et al.  Slimming javascript applications: An approach for removing unused functions from javascript libraries , 2019, Inf. Softw. Technol..

[34]  Diomidis Spinellis,et al.  A repository of Unix history and evolution , 2017, Empirical Software Engineering.

[35]  Olivier Barais,et al.  The Emergence of Software Diversity in Maven Central , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).