Architectural Support for Dynamic Linking

All software in use today relies on libraries, including standard libraries (e.g., C, C++) and application-specific libraries (e.g., libxml, libpng). Most libraries are loaded in memory and dynamically linked when programs are launched, resolving symbol addresses across the applications and libraries. Dynamic linking has many benefits: It allows code to be reused between applications, conserves memory (because only one copy of a library is kept in memory for all the applications that share it), and allows libraries to be patched and updated without modifying programs, among numerous other benefits. However, these benefits come at the cost of performance. For every call made to a function in a dynamically linked library, a trampoline is used to read the function address from a lookup table and branch to the function, incurring memory load and branch operations. Static linking avoids this performance penalty, but loses all the benefits of dynamic linking. Given its myriad benefits, dynamic linking is the predominant choice today, despite the performance cost. In this work, we propose a speculative hardware mechanism to optimize dynamic linking by avoiding executing the trampolines for library function calls, providing the benefits of dynamic linking with the performance of static linking. Speculatively skipping the memory load and branch operations of the library call trampolines improves performance by reducing the number of executed instructions and gains additional performance by reducing pressure on the instruction and data caches, TLBs, and branch predictors. Because the indirect targets of library call trampolines do not change during program execution, our speculative mechanism never misspeculates in practice. We evaluate our technique on real hardware with production software and observe up to 4% speedup using only 1.5KB of on-chip storage.

[1]  A. Seznec,et al.  Trading Conflict And Capacity Aliasing In Conditional Branch Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[2]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[3]  Hovav Shacham,et al.  On the effectiveness of address-space randomization , 2004, CCS '04.

[4]  Michael Franz Dynamic Linking of Software Components , 1997, Computer.

[5]  Daniel A. Jiménez,et al.  Reconsidering complex branch predictors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[6]  Sarita V. Adve,et al.  Performance of database workloads on shared-memory systems with out-of-order processors , 1998, ASPLOS VIII.

[7]  Onur Mutlu,et al.  VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization , 2007, ISCA '07.

[8]  Carlo Curino,et al.  OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases , 2013, Proc. VLDB Endow..

[9]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[10]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[11]  Michael Franz,et al.  Continuous program optimization: A case study , 2003, TOPL.

[12]  Anant Agarwal,et al.  Evaluating the performance of software cache coherence , 1989, ASPLOS III.

[13]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[14]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[15]  Yale N. Patt,et al.  A two-level approach to making class predictions , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[16]  Edward S. Kablaoui Samba logging for audit trails , 2004 .

[17]  Jian Huang,et al.  Exploiting basic block value locality with block reuse , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[18]  Frank Piessens,et al.  JITSec: Just-in-time security for code injection attacks , 2010 .

[19]  Donald E. Porter,et al.  Rethinking the library OS from the top down , 2011, ASPLOS XVI.