SAMS multi-layout memory: providing multiple views of data to boost SIMD performance
暂无分享,去创建一个
[1] Paul Budnik,et al. The Organization and Use of Parallel Memories , 1971, IEEE Transactions on Computers.
[2] David T. Harper,et al. Conflict-Free Vector Access Using a Dynamic Storage Scheme , 1991, IEEE Trans. Computers.
[3] Ayal Zaks,et al. Outer-loop vectorization - revisited for short SIMD architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[4] M. Suzuoki,et al. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor , 2006, IEEE Journal of Solid-State Circuits.
[5] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[6] B. Flachs,et al. A streaming processing unit for a CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..
[7] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[8] Roger Espasa,et al. Conflict-free accesses to strided vectors on a banked cache , 2005, IEEE Transactions on Computers.
[9] Chunyang Gou,et al. Sams: single-affiliation multiple-stride parallel memory scheme , 2008, MAW '08.
[10] Eduard Ayguadé,et al. Conflict-Free Access for Streams in Multimodule Memories , 1995, IEEE Trans. Computers.
[11] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[12] David T. Harper,et al. Increased Memory Performance During Vector Accesses Through the use of Linear Address Transformations , 1992, IEEE Trans. Computers.
[13] J. Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[14] Juergen Pille,et al. The Vector Fixed Point Unit of the Synergistic Processor Element of the Cell Architecture Processor , 2006, Proceedings of the Design Automation & Test in Europe Conference.
[15] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[16] Mateo Valero,et al. Command vector memory systems: high performance at low cost , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[17] S.H. Dhong,et al. A 4.8GHz fully pipelined embedded SRAM in the streaming processor of a CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..
[18] James Smith,et al. A Simulation Study of the CRAY X-MP Memory System , 1986, IEEE Transactions on Computers.
[19] Khaled Z. Ibrahim,et al. Implementing Wilson-Dirac operator on the cell broadband engine , 2008, ICS '08.
[20] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[21] Mateo Valero,et al. Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.
[22] David T. Harper,et al. Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems , 1991, IEEE Trans. Parallel Distributed Syst..
[23] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[24] Daehyun Kim,et al. Architectural support for uniprocessor and multiprocessor active memory systems , 2004, IEEE Transactions on Computers.
[25] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[26] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.