Hardware read-write lock elision

Hardware Lock Elision (HLE) represents a promising technique to enhance parallelism of concurrent applications relying on conventional, lock-based synchronization. The idea at the basis of current HLE approaches is to wrap critical sections into hardware transactions: this allows critical sections to be executed in parallel using a speculative approach, while leveraging on conflict detection capabilities provided by hardware transactions to ensure equivalent semantics to pessimistic lock-based synchronization. In this paper we present RW-LE, the first HLE approach targeting read-write locks. RW-LE introduces an innovative hardware-software co-design that exploits two recent micro-architectural features of POWER8 processors: suspending/resuming transaction execution and rollback-only transactions. RW-LE's original design provides two major benefits with respect to existing HLE techniques: i) eliding the read lock without resorting to the use of hardware transactions, and ii) avoiding to track read memory accesses issued in the write critical section. We evaluate RW-LE by means of an extensive experimental study based on a variety of benchmarks and real-life, complex applications. Our results demonstrate that RW-LE can provide striking performance gain of up to one order of magnitude with respect to state of the art HLE approaches.

[1]  Paul E. McKenney,et al.  READ-COPY UPDATE: USING EXECUTION HISTORY TO SOLVE CONCURRENCY PROBLEMS , 2002 .

[2]  Timothy J. Slegel,et al.  Transactional Memory Architecture and Implementation for IBM System Z , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[3]  Jonathan Walpole,et al.  Performance of memory reclamation for lockless synchronization , 2007, J. Parallel Distributed Comput..

[4]  James R. Goodman,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.

[5]  Nir Shavit,et al.  Read-log-update: a lightweight synchronization mechanism for concurrent programming , 2015, SOSP.

[6]  Nir Shavit,et al.  Pessimistic Software Lock-Elision , 2012, DISC.

[7]  Joel Emer,et al.  Hardware Support for Thread-Level Speculation , 2003 .

[8]  Yehuda Afek,et al.  Reduced Hardware Lock Elision , 2014 .

[9]  Amitabha Roy,et al.  A runtime system for software lock elision , 2009, EuroSys '09.

[10]  Christoforos E. Kozyrakis,et al.  Transactional programming in a multi-core environment , 2007, PPoPP.

[11]  Yehuda Afek,et al.  Software-improved hardware lock elision , 2014, PODC '14.

[12]  Jan Vitek,et al.  STMBench7: a benchmark for software transactional memory , 2007, EuroSys '07.

[13]  Jing Wang,et al.  Protecting Private Keys against Memory Disclosure Attacks Using Hardware Transactional Memory , 2015, 2015 IEEE Symposium on Security and Privacy.

[14]  Paul E. McKenney Memory ordering in modern microprocessors, Part I , 2005 .

[15]  Irina Calciu,et al.  Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory , 2014 .

[16]  Maged M. Michael,et al.  Quantitative comparison of Hardware Transactional Memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8 , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[17]  Mark Moir,et al.  Hardware extensions to make lazy subscription safe , 2014, ArXiv.

[18]  Yujie Liu,et al.  Case Study: Using Transactions in Memcached , 2015, Transactional Memory.

[19]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  Nuno Diegues,et al.  Self-Tuning Intel Transactional Synchronization Extensions , 2014, ICAC.

[21]  Maged M. Michael,et al.  Transactional memory support in the IBM POWER8 processor , 2015, IBM J. Res. Dev..

[22]  Luís E. T. Rodrigues,et al.  Virtues and limitations of commodity hardware transactional memory , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[23]  Haibo Chen,et al.  Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks , 2014, USENIX Annual Technical Conference.

[24]  Mark Moir,et al.  Adaptive integration of hardware and software lock elision techniques , 2014, SPAA.

[25]  Timothy L. Harris,et al.  A Pragmatic Implementation of Non-blocking Linked-Lists , 2001, DISC.

[26]  Maged M. Michael,et al.  Robust architectural support for transactional memory in the power architecture , 2013, ISCA.