The combined effectiveness of unimodular transformations, tiling, and software prefetching
暂无分享,去创建一个
Jacqueline Chame | Sungdo Moon | Daeyeon Park | Weihua Mao | Rafael H. Saavedra | W. Mao | D. Park | Jacqueline Chame | Sungdo Moon
[1] King-Sun Fu,et al. Data Coherence Problem in a Multicache System , 1985, IEEE Transactions on Computers.
[2] M. Hill,et al. Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[3] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[4] W. Jalby,et al. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.
[5] Jack Dongarra,et al. LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.
[6] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[7] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[8] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[9] B J Smith,et al. A pipelined, shared resource MIMD computer , 1986 .
[10] Ken Kennedy,et al. Improving register allocation for subscripted variables , 1990, PLDI '90.
[11] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[12] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.
[13] Michel Dubois,et al. Concurrent Miss Resolution in Multiprocessor Caches , 1988, ICPP.
[14] Calvin K. Tang. Cache system design in the tightly coupled multiprocessor system , 1976, AFIPS '76.
[15] Utpal Banerjee,et al. Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.
[16] Pen-Chung Yew,et al. : Data Prefetching In Shared Memory Multiprocessors , 1987, ICPP.
[17] Robert H. Halstead,et al. MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.
[18] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[19] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.
[20] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[21] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[22] W. K. George,et al. University of Illinois at Urbana-Champain , 1997 .
[23] Walid Abu-Sufah,et al. Improving the performance of virtual memory computers. , 1979 .
[24] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[25] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.
[26] Olivier Temam,et al. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93. Proceedings.
[27] Ken Kennedy,et al. Automatic translation of FORTRAN programs to vector form , 1987, TOPL.
[28] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[29] .May ..id. University of Illinois , 1919, The Grants Register 2021.
[30] Jack J. Dongarra,et al. Solving linear systems on vector and shared memory computers , 1990 .
[31] Michel Dubois,et al. Synchronization, coherence, and event ordering in multiprocessors , 1988, Computer.
[32] John Randal Allen,et al. Dependence analysis for subscripted variables and its application to program transformations , 1983 .