Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture
暂无分享,去创建一个
Eduard Ayguadé | Marc González | Xavier Martorell | Nikola Vujic | E. Ayguadé | X. Martorell | Marc González | Nikola Vujic
[1] S. Asano,et al. The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..
[2] Erik Brockmeyer,et al. A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[3] Balaram Sinharoy,et al. POWER5 system microarchitecture , 2005, IBM J. Res. Dev..
[4] Tao Zhang,et al. Prefetching irregular references for software cache on cell , 2008, CGO '08.
[5] Robert A. Walker,et al. Interrupt Triggered Software Prefetching for Embedded CPU Instruction Cache , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).
[6] Jack Dongarra,et al. Introduction to the HPCChallenge Benchmark Suite , 2004 .
[7] Bronis R. de Supinski,et al. The OpenMP Memory Model , 2005, IWOMP.
[8] Tao Zhang,et al. Orchestrating data transfer for the cell/B.E. processor , 2008, ICS '08.
[9] Yale N. Patt,et al. An effective programmable prefetch engine for on-chip caches , 1995, MICRO 1995.
[10] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.
[11] B. R. Rau,et al. Code generation schema for modulo scheduled loops , 1992, MICRO 1992.
[12] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[13] Eduard Ayguadé,et al. Hybrid access-specific software cache techniques for the cell BE architecture , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[14] Michael Gschwind,et al. Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture , 2006, IBM Syst. J..
[15] Fabrizio Petrini,et al. Cell Multiprocessor Communication Network: Built for Speed , 2006, IEEE Micro.
[16] Eduard Ayguadé,et al. Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture , 2010 .
[17] Wen-mei W. Hwu,et al. Modulo scheduling of loops in control-intensive non-numeric programs , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[18] Yunheung Paek,et al. Efficient and precise array access analysis , 2002, TOPL.
[19] Tien-Fu Chen,et al. Alternative implementations of hybrid branch predictors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[20] H. Peter Hofstee,et al. Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.
[21] Martin Hopkins,et al. A novel SIMD architecture for the cell heterogeneous chip-multiprocessor , 2005, 2005 IEEE Hot Chips XVII Symposium (HCS).
[22] B. Ramakrishna Rau,et al. Code generation schema for modulo scheduled loops , 1992, MICRO.