FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue
暂无分享,去创建一个
John Giacomoni | Tipp Moseley | Manish Vachharajani | M. Vachharajani | Tipp Moseley | John Giacomoni
[1] Calton Pu,et al. Threads and input/output in the synthesis kernal , 1989, SOSP '89.
[2] Maged M. Michael,et al. Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors , 1998, J. Parallel Distributed Comput..
[3] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[4] Mikko H. Lipasti,et al. Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing , 2001, MICRO.
[5] Saman P. Amarasinghe. Multicores from the Compiler's Perspective: A Blessing or a Curse? , 2005, CGO.
[6] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[7] Theodore Johnson,et al. A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap , 1994, IEEE Trans. Computers.
[8] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[9] Guilherme Ottoni,et al. From sequential programs to concurrent threads , 2006, IEEE Computer Architecture Letters.
[10] Harrick M. Vin,et al. Overcoming the memory wall in packet processing , 2005 .
[11] Patrick Crowley,et al. Exploiting locality to ameliorate packet queue contention and serialization , 2006, CF '06.
[12] Thomas E. Anderson,et al. The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..
[13] John David Valois. Lock-free data structures , 1996 .
[14] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[15] Janak H. Patel,et al. A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.
[16] Long Li,et al. Automatically partitioning packet processing applications for pipelined architectures , 2005, PLDI '05.
[17] Yi Zhang,et al. A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems , 2001, SPAA '01.
[18] Kourosh Gharachorloo,et al. Detecting violations of sequential consistency , 1991, SPAA '91.
[19] David I. August,et al. Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[20] Vikram A. Saletore,et al. ETA: experience with an Intel Xeon processor as a packet processing engine , 2004, IEEE Micro.
[21] Milind Girkar,et al. Automatic Extraction of Functional Parallelism from Ordinary Programs , 1992, IEEE Trans. Parallel Distributed Syst..
[22] William N. Scherer,et al. Scalable synchronous queues , 2009, Commun. ACM.
[23] Nir Shavit,et al. An Optimistic Approach to Lock-Free FIFO Queues , 2004, DISC.
[24] C. A. R. Hoare,et al. Communicating sequential processes , 1978, CACM.
[25] Dirk Grunwald,et al. A stateless, content-directed data prefetching mechanism , 2002, ASPLOS X.
[26] Mark Moir,et al. Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.
[27] Maurice Herlihy,et al. Linearizability: a correctness condition for concurrent objects , 1990, TOPL.
[28] Leslie Lamport,et al. Specifying Concurrent Program Modules , 1983, TOPL.