Coverage-Based Debloating for Java Bytecode

Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, performance, and for maintenance. In this article, we introduce a novel technique for debloating, which we call coverage-based debloating. We implement the technique for one single language: Java bytecode. We leverage a combination of state-of-the-art Java bytecode coverage tools to precisely capture what parts of a project and its dependencies are used when running with a specific workload. Then, we automatically remove the parts that are not covered, in order to generate a debloated version of the project. We succeed to debloat 211 library versions from a dataset of 94 unique open-source Java libraries. The debloated versions are syntactically correct and preserve their original behaviour according to the workload. Our results indicate that 68.3% of the libraries’ bytecode and 20.3% of their total dependencies can be removed through coverage-based debloating. For the first time in the literature on software debloating, we assess the utility of debloated libraries with respect to client applications that reuse them. We select 988 client projects that either have a direct reference to the debloated library in their source code or which test suite covers at least one class of the libraries that we debloat. Our results show that 81.5% of the clients, with at least one test that uses the library, successfully compile and pass their test suite when the original library is replaced by its debloated version.

[1]  Serena Elisa Ponta,et al.  The Used, the Bloated, and the Vulnerable: Reducing the Attack Surface of an Industrial Application , 2021, 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[2]  Yuming Zhou,et al.  How Far Have We Progressed in Identifying Self-admitted Technical Debts? A Comprehensive Empirical Study , 2021, ACM Trans. Softw. Eng. Methodol..

[3]  Benoit Baudry,et al.  A longitudinal analysis of bloated Java dependencies , 2021, ESEC/SIGSOFT FSE.

[4]  Benoit Baudry,et al.  Duets: A Dataset of Reproducible Pairs of Java Library-Clients , 2021, 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR).

[5]  Shane McIntosh,et al.  The nature of build changes , 2021, Empirical Software Engineering.

[6]  Diomidis Spinellis,et al.  Software reuse cuts both ways: An empirical analysis of its relationship with security vulnerabilities , 2021, J. Syst. Softw..

[7]  Monperrus Martin,et al.  A comprehensive study of bloated dependencies in the Maven ecosystem , 2020, Empirical Software Engineering.

[8]  Georgios Portokalidis,et al.  Large-scale Debloating of Binary Shared Libraries , 2020, Digital Threats: Research and Practice.

[9]  Miryung Kim,et al.  JShrink: in-depth investigation into debloating modern Java applications , 2020, ESEC/SIGSOFT FSE.

[10]  Xiaoyin Wang,et al.  Taming Behavioral Backward Incompatibilities via Cross-Project Testing and Analysis , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[11]  Sven Apel,et al.  Is Static Analysis Able to Identify Unnecessary Source Code? , 2020, ACM Trans. Softw. Eng. Methodol..

[12]  Jens Palsberg,et al.  Binary reduction of dependency graphs , 2019, ESEC/SIGSOFT FSE.

[13]  Alessandro Orso,et al.  Identifying Features of Android Apps from Execution Traces , 2019, 2019 IEEE/ACM 6th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[14]  Hong Mei,et al.  An Empirical Study on API Usages , 2019, IEEE Transactions on Software Engineering.

[15]  Michalis Polychronakis,et al.  Configuration-Driven Software Debloating , 2019, EuroSec@EuroSys.

[16]  Olivier Barais,et al.  The Emergence of Software Diversity in Maven Central , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[17]  Alexandre Bergel,et al.  Slimming javascript applications: An approach for removing unused functions from javascript libraries , 2019, Inf. Softw. Technol..

[18]  Mayur Naik,et al.  PolyDroid: Learning-Driven Specialization of Mobile Applications , 2019, ArXiv.

[19]  Prithayan Barua,et al.  Binary Debloating for Security via Demand Driven Loading , 2019, ArXiv.

[20]  Chenxiong Qian,et al.  RAZOR: A Framework for Post-deployment Software Debloating , 2019, USENIX Security Symposium.

[21]  Pierre Laperdrix,et al.  Less is More: Quantifying the Security Benefits of Debloating Web Applications , 2019, USENIX Security Symposium.

[22]  Amjed Tahir,et al.  On the Soundness of Call Graph Construction in the Presence of Dynamic Language Features - A Benchmark and Tool Evaluation , 2018, APLAS.

[23]  Marco Tulio Valente,et al.  What's in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform , 2018, J. Syst. Softw..

[24]  Mayur Naik,et al.  Effective Program Debloating via Reinforcement Learning , 2018, CCS.

[25]  Xiao Liu,et al.  RedDroid: Android Application Redundancy Customization Based on Static Analysis , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[26]  Hashim Sharif,et al.  Trimmer: Application Specialization for Code Debloating , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[27]  Lu Fang,et al.  Understanding and Combating Memory Bloat in Managed Data-Intensive Systems , 2018, ACM Trans. Softw. Eng. Methodol..

[28]  Tibor Gyimóthy,et al.  Code coverage differences of Java bytecode and source code instrumentation tools , 2019, Software Quality Journal.

[29]  Aravind Prakash,et al.  A Multi-OS Cross-Layer Study of Bloating in User Programs, Kernel and Managed Execution Environments , 2017, FEAST@CCS.

[30]  Guru Venkataramani,et al.  DamGate: Dynamic Adaptive Multi-feature Gating in Program Binaries , 2017, FEAST@CCS.

[31]  Somesh Jha,et al.  Cimplifier: automatically debloating containers , 2017, ESEC/SIGSOFT FSE.

[32]  Alexander Serebrenik,et al.  Challenges for Static Analysis of Java Reflection - Literature Review and Empirical Study , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[33]  Guoqing Xu,et al.  Dynamic Dependence Summaries , 2017, ACM Trans. Softw. Eng. Methodol..

[34]  Ahmet Çelik,et al.  Build system with lazy retrieval for Java projects , 2016, SIGSOFT FSE.

[35]  Peng Liu,et al.  JRed: Program Customization and Bloatware Mitigation Based on Static Analysis , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[36]  Tibor Gyimóthy,et al.  Negative Effects of Bytecode Instrumentation on Java Source Code Coverage , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[37]  Peng Liu,et al.  Feature-Based Software Customization: Preliminary Analysis, Formalization, and Methods , 2016, 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE).

[38]  Gerard J. Holzmann,et al.  Code Inflation , 2015, IEEE Softw..

[39]  Gail E. Kaiser,et al.  Phosphor: illuminating dynamic data flow in commodity jvms , 2014, OOPSLA.

[40]  Maximilian Junker,et al.  Which Features Do My Users (Not) Use? , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[41]  Edith Schonberg,et al.  Scalable Runtime Bloat Detection Using Abstract Dynamic Slicing , 2014, ACM Trans. Softw. Eng. Methodol..

[42]  A. Jefferson Offutt,et al.  Is bytecode instrumentation as good as source code instrumentation: An empirical study with industrial tools (Experience Report) , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[43]  Mangala Gowri Nanda,et al.  Combining concern input with program analysis for bloat detection , 2013, OOPSLA.

[44]  Guoqing Xu,et al.  Cachetor: detecting cacheable data to remove bloat , 2013, ESEC/FSE 2013.

[45]  Guoqing Xu,et al.  CoCo: Sound and Adaptive Replacement of Java Collections , 2013, ECOOP.

[46]  Toshio Nakatani,et al.  A trace-based Java JIT compiler retrofitted from a method-based compiler , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[47]  Matthew Arnold,et al.  Software bloat analysis: finding, removing, and preventing performance problems in modern large-scale object-oriented applications , 2010, FoSER '10.

[48]  Edith Schonberg,et al.  Four Trends Leading to Java Runtime Bloat , 2010, IEEE Software.

[49]  Qian Yang,et al.  A Survey of Coverage-Based Testing Tools , 2009, Comput. J..

[50]  Mason Chang,et al.  Trace-based just-in-time type specialization for dynamic languages , 2009, PLDI '09.

[51]  Arie van Deursen,et al.  A Systematic Survey of Program Comprehension through Dynamic Analysis , 2008, IEEE Transactions on Software Engineering.

[52]  Peter Kampstra,et al.  Beanplot: A Boxplot Alternative for Visual Comparison of Distributions , 2008 .

[53]  Walter Binder,et al.  Advanced Java bytecode instrumentation , 2007, PPPJ.

[54]  Ulrik Pagh Schultz,et al.  Automatic program specialization for Java , 2000, TOPL.

[55]  Frank Tip,et al.  Practical extraction techniques for Java , 2002, TOPL.

[56]  TipFrank,et al.  Practical experience with an application extractor for Java , 1999 .

[57]  Frank Tip,et al.  Practical experience with an application extractor for Java , 1999, OOPSLA '99.

[58]  Frank Yellin,et al.  The Java Virtual Machine Specification , 1996 .

[59]  Niklaus Wirth,et al.  A Plea for Lean Software , 1995, Computer.

[60]  Joseph Robert Horgan,et al.  Dynamic program slicing , 1990, PLDI '90.