论文信息 - An Efficient Abortable-locking Protocol for Multi-level NUMA Systems

An Efficient Abortable-locking Protocol for Multi-level NUMA Systems

The popularity of Non-Uniform Memory Access (NUMA) architectures has led to numerous locality-preserving hierarchical lock designs, such as HCLH, HMCS, and cohort locks. Locality-preserving locks trade fairness for higher throughput. Hence, some instances of acquisitions can incur long latencies, which may be intolerable for certain applications. Few locks admit a waiting thread to abandon its protocol on a timeout. State-of-the-art abortable locks are not fully locality aware, introduce high overheads, and unsuitable for frequent aborts. Enhancing locality-aware locks with lightweight timeout capability is critical for their adoption. In this paper, we design and evaluate the HMCS-T lock, a Hierarchical MCS (HMCS) lock variant that admits a timeout. HMCS-T maintains the locality benefits of HMCS while ensuring aborts to be lightweight. HMCS-T offers the progress guarantee missing in most abortable queuing locks. Our evaluations show that HMCS-T offers the timeout feature at a moderate overhead over its HMCS analog. HMCS-T, used in an MPI runtime lock, mitigated the poor scalability of an MPI+OpenMP BFS code and resulted in 4.3x superior scaling.

[1] Mark Moir,et al. Composite Abortable Locks , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[2] Gerard J. Holzmann,et al. The Model Checker SPIN , 1997, IEEE Trans. Software Eng..

[3] Torsten Hoefler,et al. Scalable communication protocols for dynamic sparse data exchange , 2010, PPoPP '10.

[4] Nir Shavit,et al. Lock Cohorting , 2015, ACM Trans. Parallel Comput..

[5] John M. Mellor-Crummey,et al. High performance locks for multi-level NUMA systems , 2015, PPoPP.

[6] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[7] Satoshi Matsuoka,et al. Characterizing MPI and Hybrid MPI+Threads Applications at Scale: Case Study with BFS , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[8] Erik Hagersten,et al. Queue locks on cache coherent multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[9] Satoshi Matsuoka,et al. MPI+Threads: runtime contention and remedies , 2015, PPOPP.

[10] Robert E. Tarjan,et al. Self-adjusting binary search trees , 1985, JACM.

[11] William N. Scherer,et al. Scalable queue-based spin locks with timeout , 2001, PPoPP '01.

[12] A. Amer. 1 Locking Aspects in Multithreaded MPI Implementations , 2016 .

[13] Prasad Jayanti,et al. Adaptive and efficient abortable mutual exclusion , 2003, PODC '03.

[14] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[15] Michael L. Scott,et al. Non-blocking timeout in scalable queue-based spin locks , 2002, PODC '02.

[16] John M. Mellor-Crummey,et al. Contention-conscious, locality-preserving locks , 2016, PPoPP.

[17] Philipp Woelfel,et al. RMR-Efficient Randomized Abortable Mutual Exclusion , 2012 .

[18] Tudor David,et al. Everything you always wanted to know about synchronization but were afraid to ask , 2013, SOSP.

[19] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.