Resource-Aware Compiler Prefetching for Many-Cores
暂无分享,去创建一个
[1] Xuejun Yang,et al. Improving the Performance of GCC by Exploiting IA-64 Architectural Features , 2005, Asia-Pacific Computer Systems Architecture Conference.
[2] Uzi Vishkin,et al. Fpga-based prototype of a pram-on-chip processor , 2008, CF '08.
[3] Dean M. Tullsen,et al. Effective cache prefetching on bus-based multiprocessors , 1995, TOCS.
[4] Henry M. Levy,et al. An architecture for software-controlled data prefetching , 1991, ISCA '91.
[5] K. Kavi. Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .
[6] Sanguthevar Rajasekaran,et al. Handbook of Parallel Computing - Models, Algorithms and Applications , 2007 .
[7] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[8] Gang Qu,et al. Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing , 2007, 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2007).
[9] Uzi Vishkin,et al. Is teaching parallel algorithmic thinking to high school students possible?: one teacher's experience , 2010, SIGCSE.
[10] Jiang Zhu,et al. Building a RCP (Rate Control Protocol) Test Network , 2007 .
[11] Ken Kennedy,et al. Compiler support for software prefetching , 1998 .
[12] Amos R. Omondi,et al. Advances in Computer Systems Architecture , 2003, Lecture Notes in Computer Science.
[13] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[14] Michel Dubois,et al. Sequential Hardware Prefetching in Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..
[15] George C. Caragea,et al. Brief announcement: performance potential of an easy-to-program PRAM-on-chip prototype versus state-of-the-art processor , 2009, SPAA '09.
[16] Josep Torrellas,et al. Scalable Cache Miss Handling for High Memory-Level Parallelism , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[17] Uzi Vishkin,et al. Towards a First Vertical Prototyping of an Extremely Fine-Grained Parallel Programming Approach , 2003, Theory of Computing Systems.
[18] Wei-Fen Lin,et al. Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[19] George C. Caragea,et al. Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chip Platform , 2006, Handbook of Parallel Computing.
[20] Todd C. Mowry,et al. Tolerating latency in multiprocessors through compiler-inserted prefetching , 1998, TOCS.
[21] Uzi Vishkin,et al. A pilot study to compare programming effort for two parallel programming models , 2007, J. Syst. Softw..
[22] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[23] George C. Caragea,et al. General-Purpose vs . GPU : Comparison of Many-Cores on Irregular Workloads , 2010 .
[24] Gang Qu,et al. Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing , 2007 .
[25] Scott A. Mahlke,et al. Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.