Experimental implementation of dynamic access ordering
暂无分享,去创建一个
Sally A. McKee | William A. Wulf | James H. Aylor | Robert H. Klenke | Steven A. Moyer | Andrew J. Schwab | Charles Y. Hitchcock | W. Wulf | J. Aylor | R. Klenke | S. Mckee | S. Moyer | C. Hitchcock | A. J. Schwab
[1] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.
[2] Janak H. Patel,et al. Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.
[3] Eduard Ayguadé,et al. Increasing the number of strides for conflict-free vector access , 1992, ISCA '92.
[4] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[5] Andrew R. Pleszkun,et al. PIPE: a VLSI decoupled architecture , 1985, ISCA '85.
[6] Randy H. Katz,et al. HIGH PERFORMANCE MICROPROCESSOR ARCHITECTURES , 1990 .
[7] Arthur B. Maccabe. Computer Systems: Architecture, Organization, and Programming , 1993 .
[8] Rajiv Gupta,et al. Compile-time techniques for efficient utilization of parallel memories , 1988, PPoPP 1988.
[9] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[10] Manuel E. Benitez,et al. Code generation for streaming: an access/execute mechanism , 1991, ASPLOS IV.
[11] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[12] Steven A. Moyer,et al. Access Ordering and Effective Memory Bandwidth , 1993 .
[13] David T. Harper,et al. Increased Memory Performance During Vector Accesses Through the use of Linear Address Transformations , 1992, IEEE Trans. Computers.
[14] Ivan Sklenar. Prefetch unit for vector operations on scalar computers (abstract) , 1992, ISCA '92.
[15] John P. Hayes,et al. Computer Architecture and Organization , 1980 .
[16] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[17] V. Klema. LINPACK user's guide , 1980 .
[18] Henry M. Levy,et al. An architecture for software-controlled data prefetching , 1991, ISCA '91.
[19] B. Ramakrishna Rau,et al. Pseudo-randomly interleaved memory , 1991, ISCA '91.
[20] James E. Smith,et al. The ZS-1 central processor , 1987, ASPLOS 1987.
[21] Rajiv Gupta,et al. Compile-time techniques for efficient utilization of parallel memories , 1988, PPEALS '88.
[22] William A. Wulf,et al. Evaluation of the WM Architecture , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[23] Andrew R. Pleszkun,et al. PIPE: a VLSI decoupled architecture , 1985, ISCA '85.
[24] Ken Kennedy,et al. Blocking Linear Algebra Codes for Memory Hierarchies , 1989, PPSC.
[25] Ivan Tomek. Foundations of computer architecture and organization , 1990 .
[26] Jean-Loup Baer,et al. Computer systems architecture , 1980 .
[27] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[28] Ivan Sklenár. Prefetch unit for vector operations on scalar computers , 1992, CARN.
[29] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).