论文信息 - A Task-Centric Memory Model for Scalable Accelerator Architectures

A Task-Centric Memory Model for Scalable Accelerator Architectures

This article presents a memory model for parallel compute accelerators with task-based programming models that uses a software protocol, working in collaboration with hardware caches, to maintain a coherent, single address space view of memory without requiring hardware cache coherence. The memory model supports visual computing applications, which are becoming an important class of workloads capable of exploiting 1,000-core processors.

Sanjay J. Patel | Steven S. Lumetta | Daniel R. Johnson | John H. Kelm | Matthew I. Frank

[1] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[2] Sanjay J. Patel,et al. Tradeoffs in designing accelerator architectures for visual computing , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[3] Kourosh Gharachorloo,et al. Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.

[4] James R. Goodman,et al. Cache Consistency and Sequential Consistency , 1991 .

[5] James R. Larus,et al. Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.

[6] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[7] Daniel Gajski,et al. CEDAR: a large scale multiprocessor , 1983, CARN.

[8] Michael Gschwind. Chip multiprocessing and the cell broadband engine , 2006, CF '06.

[9] Sanjay J. Patel,et al. Rigel: an architecture and scalable programming interface for a 1000-core accelerator , 2009, ISCA '09.

[10] Willy Zwaenepoel,et al. Munin: distributed shared memory based on type-specific memory coherence , 1990, PPOPP '90.

[11] Christoforos E. Kozyrakis,et al. Comparing memory systems for chip multiprocessors , 2007, ISCA '07.

[12] Alan L. Cox,et al. TreadMarks: shared memory computing on networks of workstations , 1996 .

[13] Brian N. Bershad,et al. The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[14] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[15] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[16] Liviu Iftode,et al. Scope consistency: a bridge between release consistency and entry consistency , 1996, SPAA '96.

[17] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.