Unlocking Energy

Locks are a natural place for improving the energy efficiency of software systems. First, concurrent systems are mainstream and when their threads synchronize, they typically do it with locks. Second, locks are well-defined abstractions, hence changing the algorithm implementing them can be achieved without modifying the system. Third, some locking strategies consume more power than others, thus the strategy choice can have a real effect. Last but not least, as we show in this paper, improving the energy efficiency of locks goes hand in hand with improving their throughput. It is a win-win situation. We make our case for this throughput/energy-efficiency correlation through a series of observations obtained from an exhaustive analysis of the energy efficiency of locks on two modern processors and six software systems: Memcached, MySQL, SQLite, RocksDB, HamsterDB, and Kyoto Kabinet. We propose simple lock-based techniques for improving the energy efficiency of these systems by 33% on average, driven by higher throughput, and without modifying the systems.

[1]  David Dice,et al.  The TURBO Diaries: Application-controlled Frequency Scaling Explained , 2014, USENIX Annual Technical Conference.

[2]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[3]  David Black-Schaffer,et al.  Towards more efficient execution: a decoupled access-execute approach , 2013, ICS '13.

[4]  Nir Shavit,et al.  A Hierarchical CLH Queue Lock , 2006, Euro-Par.

[5]  Luis Ceze,et al.  Characterizing the Performance and Energy Efficiency of Lock-Free Data Structures , 2011, 2011 15th Workshop on Interaction between Compilers and Computer Architectures.

[6]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[7]  William N. Scherer,et al.  Preemption Adaptivity in Time-Published Queue-Based Spin Locks , 2005, HiPC.

[8]  Mehul A. Shah,et al.  Analyzing the energy efficiency of a database server , 2010, SIGMOD Conference.

[9]  Tudor David,et al.  Everything you always wanted to know about synchronization but were afraid to ask , 2013, SOSP.

[10]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[11]  Nir Shavit,et al.  Lock Cohorting , 2015, ACM Trans. Parallel Comput..

[12]  Stephen Phillips,et al.  M7: Next generation SPARC , 2014, IEEE Hot Chips Symposium.

[13]  Anuj Pathania,et al.  Price theory based power management for heterogeneous multi-cores , 2014, ASPLOS.

[14]  Luca Benini,et al.  Compilers and Operating Systems for Low Power , 2012, Springer US.

[15]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[16]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[17]  Ryan Johnson,et al.  Decoupling contention management from scheduling , 2010, ASPLOS XV.

[18]  Xiao Zhang,et al.  HaPPy: Hyperthread-aware Power Profiling Dynamically , 2014, USENIX Annual Technical Conference.

[19]  Karthikeyan Sankaralingam,et al.  Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[20]  MitraTulika,et al.  Price theory based power management for heterogeneous multi-cores , 2014 .

[21]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[22]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[23]  Julia L. Lawall,et al.  Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications , 2012, USENIX Annual Technical Conference.

[24]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[25]  Yu David Liu,et al.  Energy-efficient work-stealing language runtimes , 2014, ASPLOS.

[26]  Traviss. Craig,et al.  Building FIFO and Priority-Queuing Spin Locks from Atomic Swap , 1993 .

[27]  Chao Xu,et al.  Automated OS-level Device Runtime Power Management , 2015, ASPLOS.

[28]  Christoforos E. Kozyrakis,et al.  Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[29]  Sharad Malik,et al.  Compile-time dynamic voltage scaling settings: opportunities and limits , 2003, PLDI '03.

[30]  Thomas F. Wenisch,et al.  Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.

[31]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[32]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[33]  Tudor David,et al.  Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures , 2015, ASPLOS.

[34]  Anant Agarwal,et al.  Waiting algorithms for synchronization in large-scale multiprocessors , 1993, TOCS.

[35]  Thomas F. Wenisch,et al.  DreamWeaver: architectural support for deep sleep , 2012, ASPLOS XVII.

[36]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[37]  Anastasia Ailamaki,et al.  Decoupling contention management from scheduling , 2010, ASPLOS 2010.

[38]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[39]  A. Agarwal,et al.  Adaptive backoff synchronization techniques , 1989, ISCA '89.

[40]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[41]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[42]  Xiao Zhang,et al.  Power containers: an OS facility for fine-grained power and energy management on multicore servers , 2013, ASPLOS '13.

[43]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[44]  Alek Vainshtein,et al.  Optimal Strategies for Spinning and Blocking , 1994, J. Parallel Distributed Comput..

[45]  Yiran Chen,et al.  DCG: deterministic clock-gating for low-power microprocessor design , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[46]  Avi Mendelson,et al.  Analysis of Thermal Monitor features of the Intel® Pentium® M Processor , 2004 .

[47]  Nir Shavit,et al.  NUMA-aware reader-writer locks , 2013, PPoPP '13.

[48]  James R. Goodman,et al.  Efficient Synchronization: Let Them Eat QOLB , 1997, International Symposium on Computer Architecture.

[49]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[50]  Maurice Herlihy,et al.  Energy implications of multiprocessor synchronization , 2006, SPAA '06.

[51]  Nectarios Koziris,et al.  Facilitating efficient synchronization of asymmetric threads on hyper-threaded processors , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[52]  D. Burger,et al.  Efficient Synchronization: Let Them Eat QOLB /sup1/ , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[53]  XuChao,et al.  Automated OS-level Device Runtime Power Management , 2015 .

[54]  Erik Hagersten,et al.  Hierarchical backoff locks for nonuniform communication architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[55]  Wolfgang Lehner,et al.  Dynamic fine-grained scheduling for energy-efficient main-memory queries , 2014, DaMoN '14.