论文信息 - Liberty Queues for EPIC Architectures

Liberty Queues for EPIC Architectures

Core-to-core communication bandwidth is critical for high-performance pipeline-parallel programs. Hardware communication queues are unlikely to be implemented and are perhaps unnecessary. This paper presents Liberty Queues, a high-performance lock-free software-only ring buffer, and describes the porting effort from the original x86-64 implementation to IA-64. Liberty Queues achieve a bandwidth of 500 MB/s between unrelated processors on a first generation Itanium 2, compared with 281 MB/s on modern Opterons and 430 MB/s on modern Xeons claimed by related works. We present bandwidth results for seven different multicore and multiprocessor systems, as well as a sensitivity analysis.

[1] Maurice Herlihy,et al. Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[2] David I. August,et al. Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[3] Janak H. Patel,et al. A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.

[4] Cheng Wang,et al. Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[5] Patrick P. C. Lee,et al. A lock-free, cache-efficient shared ring buffer for multi-core architectures , 2009, ANCS '09.

[6] Nir Shavit,et al. An optimistic approach to lock-free FIFO queues , 2004, Distributed Computing.

[7] William N. Scherer,et al. Scalable synchronous queues , 2006, PPoPP '06.

[8] John Giacomoni,et al. FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue , 2008, PPoPP.

[9] Theodore Johnson,et al. A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap , 1994, IEEE Trans. Computers.

[10] Patrick P. C. Lee,et al. A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[11] Arun Raman,et al. Speculative parallelization using software multi-threaded transactions , 2010, ASPLOS XV.

[12] Yun Zhang,et al. Revisiting the Sequential Programming Model for Multi-Core , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[13] Leslie Lamport,et al. Specifying Concurrent Program Modules , 1983, TOPL.

[14] Mark Moir,et al. Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[15] Maged M. Michael,et al. Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors , 1998, J. Parallel Distributed Comput..

[16] Yi Zhang,et al. A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems , 2001, SPAA '01.