Increasing Memory Bandwidth for Vector Computations
暂无分享,去创建一个
[1] V. Klema. LINPACK user's guide , 1980 .
[2] B. Ramakrishna Rau,et al. Pseudo-randomly interleaved memory , 1991, ISCA '91.
[3] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[4] Henry M. Levy,et al. An Architecture for Software-Controlled Data Prefetching , 1991, ISCA.
[5] Eduard Ayguadé,et al. Increasing the number of strides for conflict-free vector access , 1992, ISCA '92.
[6] Wm. A. Wulf. Evaluation of the WM architecture , 1992, ISCA '92.
[7] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.
[8] Arthur B. Maccabe. Computer Systems: Architecture, Organization, and Programming , 1993 .
[9] King Lee. On the Floating Point Performance of the I860TM Microprocessor , 1992, Int. J. High Speed Comput..
[10] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[11] Steven A. Moyer,et al. Access Ordering and Effective Memory Bandwidth , 1993 .
[12] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[13] Steven A. Moyer,et al. Performance of the IPSC/860 Node Architecture , 1991 .
[14] Ivan Sklenar. Prefetch unit for vector operations on scalar computers (abstract) , 1992, ISCA '92.
[15] Manuel E. Benitez,et al. Code generation for streaming: an access/execute mechanism , 1991, ASPLOS IV.
[16] Maccabe. Computer Systems , 1993 .
[17] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[18] Kai Hwang,et al. Computer architecture and parallel processing , 1984, McGraw-Hill Series in computer organization and architecture.
[19] STEPHEN K. JONES,et al. Optimization and Simulation of Two Classes of Nonresetting Data Reconstructors , 1971, IEEE Transactions on Computers.
[20] Rajiv Gupta,et al. Compile-time techniques for efficient utilization of parallel memories , 1988, PPEALS '88.
[21] Allen D. Malony,et al. Performance prediction of loop constructs on multiprocessor hierarchical-memory systems , 1989, ICS '89.
[22] T. H. Meyer. Computer Architecture and Organization , 1982 .
[23] James E. Smith,et al. The ZS-1 central processor , 1987, ASPLOS.
[24] M. Morris Mano,et al. Computer system architecture , 1982 .
[25] Janak H. Patel,et al. Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.
[26] F. H. Mcmahon,et al. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .
[27] Paul Budnik,et al. The Organization and Use of Parallel Memories , 1971, IEEE Transactions on Computers.
[28] Gene H. Golub,et al. Scientific computing: an introduction with parallel computing , 1993 .
[29] Ivan Tomek. Foundations of computer architecture and organization , 1990 .
[30] Ivan Sklenár. Prefetch unit for vector operations on scalar computers , 1992, CARN.
[31] B. Parasuraman. High-performance microprocessor architectures , 1976, Proceedings of the IEEE.
[32] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[33] R. J. Chevance,et al. An evaluation methodology for microprocessor and system architecture , 1992, CARN.
[34] Andrew R. Pleszkun,et al. PIPE: a VLSI decoupled architecture , 1985, ISCA '85.
[35] Steven J. Wallach. The CONVEX C-1 64-bit Supercomputer , 1986, COMPCON.
[36] D LamMonica,et al. The cache performance and optimizations of blocked algorithms , 1991 .
[37] Ken Kennedy,et al. Blocking Linear Algebra Codes for Memory Hierarchies , 1989, PPSC.
[38] David T. Harper,et al. Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme , 1987, IEEE Transactions on Computers.