Rigel: an architecture and scalable programming interface for a 1000-core accelerator
暂无分享,去创建一个
Sanjay J. Patel | Steven S. Lumetta | Daniel R. Johnson | William Tuohy | John H. Kelm | Aqeel Mahesri | Neal Clayton Crago | Matthew I. Frank | J. H. Kelm | Matthew R. Johnson | M. Frank | S. Lumetta | N. Crago | Aqeel Mahesri | Matthew R. Johnson | W. Tuohy
[1] Sanjay J. Patel,et al. Tradeoffs in designing accelerator architectures for visual computing , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[2] William J. Dally,et al. Programmable Stream Processors , 2003, Computer.
[3] Eric Darve,et al. N-Body simulation on GPUs , 2006, SC.
[4] Michael Gschwind. Chip multiprocessing and the cell broadband engine , 2006, CF '06.
[5] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[6] Sivarama P. Dandamudi,et al. A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems , 1995, IEEE Trans. Parallel Distributed Syst..
[7] Jonathan Chang,et al. A 45 nm 8-Core Enterprise Xeon¯ Processor , 2010, IEEE J. Solid State Circuits.
[8] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[9] James R. Goodman,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[10] Klaus Schulten,et al. Accelerating Molecular Modeling Applications with GPU Computing , 2009 .
[11] Pradeep Dubey,et al. Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.
[12] Guy E. Blelloch,et al. Scans as Primitive Parallel Operations , 1989, ICPP.
[13] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[14] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[15] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[16] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[17] Daniel Gajski,et al. CEDAR: a large scale multiprocessor , 1983, CARN.
[18] Stefan Rusu,et al. A 45nm 8-core enterprise Xeon ® processor , 2009 .
[19] Mike Houston,et al. GPUs a closer look , 2008, SIGGRAPH '08.
[20] Burton J. Smith,et al. The architecture of HEP , 1985 .
[21] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[22] Marc Tremblay,et al. A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC® Processor , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.
[23] Norman P. Jouppi,et al. Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[24] David A. Padua,et al. Hierarchically tiled arrays for parallelism and locality , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[25] Justin P. Haldar,et al. Accelerating advanced MRI reconstructions on GPUs , 2008, J. Parallel Distributed Comput..
[26] William J. Dally,et al. Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.
[27] William J. Dally,et al. Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.
[28] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.