Data Management: The Spirit to Pursuit Peak Performance on Many-Core Processor
暂无分享,去创建一个
Dongrui Fan | Shuai Zhang | Yongbin Zhou | Junchao Zhang | Nan Yuan | Junchao Zhang | Dongrui Fan | Yongbin Zhou | Shuai Zhang | Nan Yuan
[1] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.
[2] Akif Ali,et al. Near-optimal worst-case throughput routing for two-dimensional mesh networks , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[3] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[4] William J. Dally,et al. Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.
[5] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[6] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[7] Guang R. Gao,et al. Optimizing the Fast Fourier Transform on a Multi-core Architecture , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[8] Per Stenström,et al. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[9] M. Suzuoki,et al. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor , 2006, IEEE Journal of Solid-State Circuits.
[10] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[11] Timothy Mark Pinkston,et al. Characterizing the Cell EIB On-Chip Network , 2007, IEEE Micro.
[12] Liviu Iftode,et al. Scope Consistency: A Bridge between Release Consistency and Entry Consistency , 1996, SPAA '96.
[13] Timothy Mark Pinkston,et al. On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus , 2007, First International Symposium on Networks-on-Chip (NOCS'07).
[14] Samuel Williams,et al. Scientific Computing Kernels on the Cell Processor , 2007, International Journal of Parallel Programming.
[15] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[16] W. Dally. Interconnect-limited VLSI architecture , 1999, Proceedings of the IEEE 1999 International Interconnect Technology Conference (Cat. No.99EX247).
[17] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, HiPC 2008.
[18] Axel Jantsch,et al. Network on Chip : An architecture for billion transistor era , 2000 .
[19] Guang R. Gao,et al. Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture , 2006, 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06).
[20] William J. Dally,et al. Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.
[21] M. Puschel,et al. FFT Program Generation for Shared Memory: SMP and Multicore , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[22] S. Lennart Johnsson,et al. Scheduling FFT computation on SMP and multicore systems , 2007, ICS '07.
[23] David H. Bailey,et al. FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).