A Task-Centric Memory Model for Scalable Accelerator Architectures

This article presents a memory model for parallel compute accelerators with task-based programming models that uses a software protocol, working in collaboration with hardware caches, to maintain a coherent, single address space view of memory without requiring hardware cache coherence. The memory model supports visual computing applications, which are becoming an important class of workloads capable of exploiting 1,000-core processors.

[1]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[2]  Sanjay J. Patel,et al.  Tradeoffs in designing accelerator architectures for visual computing , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[3]  Kourosh Gharachorloo,et al.  Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.

[4]  James R. Goodman,et al.  Cache Consistency and Sequential Consistency , 1991 .

[5]  James R. Larus,et al.  Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.

[6]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[7]  Daniel Gajski,et al.  CEDAR: a large scale multiprocessor , 1983, CARN.

[8]  Michael Gschwind Chip multiprocessing and the cell broadband engine , 2006, CF '06.

[9]  Sanjay J. Patel,et al.  Rigel: an architecture and scalable programming interface for a 1000-core accelerator , 2009, ISCA '09.

[10]  Willy Zwaenepoel,et al.  Munin: distributed shared memory based on type-specific memory coherence , 1990, PPOPP '90.

[11]  Christoforos E. Kozyrakis,et al.  Comparing memory systems for chip multiprocessors , 2007, ISCA '07.

[12]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[13]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[14]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[15]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[16]  Liviu Iftode,et al.  Scope consistency: a bridge between release consistency and entry consistency , 1996, SPAA '96.

[17]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.