Dynamic Access Ordering for Streamed Computations
暂无分享,去创建一个
Sally A. McKee | William A. Wulf | Dee A. B. Weikle | James H. Aylor | Sung I. Hong | Maximo H. Salinas | Robert H. Klenke | M. H. Salinas | W. Wulf | J. Aylor | R. Klenke | S. Mckee | D. Weikle
[1] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[2] Leigh Stoller,et al. Increasing TLB reach using superpages backed by shadow memory , 1998, ISCA.
[3] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[4] Kevin P. McAuliffe,et al. Automatic Management of Programmable Caches , 1988, ICPP.
[5] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[6] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[7] AyguadéEduard,et al. Increasing the number of strides for conflict-free vector access , 1992 .
[8] Michel Dubois,et al. Sequential Hardware Prefetching in Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..
[9] Bruce R. Childers,et al. Memory bandwidth optimizations for wide-bus machines , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.
[10] Mateo Valero,et al. Command vector memory systems: high performance at low cost , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[11] José María Llabería,et al. Access order to avoid inter-vector-conflicts in complex memory systems , 1995, Proceedings of 9th International Parallel Processing Symposium.
[12] Jack W. Davidson,et al. Memory access coalescing: a technique for eliminating redundant memory accesses , 1994, PLDI '94.
[13] Ivan Sklenár. Prefetch unit for vector operations on scalar computers , 1992, CARN.
[14] David R. Cheriton,et al. Software-Controlled Caches in the VMP Multiprocessor , 1986, ISCA.
[15] R. Stephenson. A and V , 1962, The British journal of ophthalmology.
[16] Eduard Ayguadé,et al. Increasing the number of strides for conflict-free vector access , 1992, ISCA '92.
[17] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[18] Richard E. Hank,et al. An efficient architecture for loop based data preloading , 1992, MICRO 1992.
[19] B. Ramakrishna Rau,et al. Pseudo-randomly interleaved memory , 1991, ISCA '91.
[20] Q. S. Gao. The Chinese remainder theorem and the prime memory system , 1993, ISCA '93.
[21] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[22] Ivan Sklenar. Prefetch unit for vector operations on scalar computers (abstract) , 1992, ISCA '92.
[23] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[24] H. Levy,et al. An architecture for software-controlled data prefetching , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.
[25] David Blythe,et al. System Support for OpenGL Direct Rendering , 1995 .
[26] Manuel E. Benitez,et al. Code generation for streaming: an access/execute mechanism , 1991, ASPLOS IV.
[27] James R. Goodman,et al. The declining effectiveness of dynamic caching for general- purpose microprocessors , 1995 .
[28] Sally A. McKee,et al. Memory system support for image processing , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[29] Steven A. Moyer,et al. Access Ordering and Effective Memory Bandwidth , 1993 .
[30] Richard Uhlig,et al. Using Lookahead to reduce memory bank contention for decoupled operand references , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[31] Norman P. Jouppi,et al. Memory-System Design Considerations for Dynamically-Scheduled Processors , 1997, ISCA.
[32] David E. Culler,et al. Design challenges of virtual networks: fast, general-purpose communication , 1999, PPoPP '99.
[33] Sally A. McKee,et al. Access ordering and memory-conscious cache utilization , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[34] David R. Cheriton,et al. Software-controlled caches in the VMP multiprocessor , 1986, ISCA 1986.
[35] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[36] Scott A. Mahlke,et al. An efficient architecture for loop based data preloading , 1992, MICRO.
[37] Scott A. Mahlke,et al. Tolerating data access latency with register preloading , 1992, ICS '92.
[38] Richard Crisp,et al. Direct RAMbus technology: the new main memory standard , 1997, IEEE Micro.
[39] Sally A. McKee,et al. Access order and effective bandwidth for streams on a Direct Rambus memory , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[40] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[41] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[42] David T. Harper,et al. Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme , 1987, IEEE Transactions on Computers.
[43] Martin Walker,et al. A Shared Memory MPP from Cray Research , 1994, Digit. Tech. J..
[44] Sally A. McKee,et al. Maximizing memory bandwidth for streamed computations , 1996 .
[45] Henry M. Levy,et al. An Architecture for Software-Controlled Data Prefetching , 1991, ISCA.
[46] Henry M. Levy,et al. An architecture for software-controlled data prefetching , 1991, ISCA '91.
[47] Tzi-cker Chiueh,et al. Sunder: a programmable hardware prefetch architecture for numerical loops , 1994, Proceedings of Supercomputing '94.
[48] Janak H. Patel,et al. Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.
[49] F. H. Mcmahon,et al. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .
[50] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.