Resource widening versus replication: limits and performance-cost trade-off
暂无分享,去创建一个
[1] R. D. Jolly,et al. A 9-ns, 1.4-gigabyte/s, 17-ported CMOS register file , 1991 .
[2] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.
[3] A. Gonzalez,et al. Hypernode reduction modulo scheduling , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[4] FranklinManoj,et al. High-bandwidth data memory systems for superscalar processors , 1991 .
[5] Vicki H. Allan,et al. Software pipelining: a comparison and improvement , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.
[6] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.
[7] David Chih-Wei Chang,et al. Microarchitecture of HaL's memory management unit , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.
[8] B. Ramakrishna Rau,et al. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.
[9] Kamran Eshraghian,et al. Principles of CMOS VLSI Design: A Systems Perspective , 1985 .
[10] John H. Edmondson,et al. Superscalar instruction execution in the 21164 Alpha microprocessor , 1995, IEEE Micro.
[11] Kunle Olukotun,et al. The case for a single-chip multiprocessor , 1996, ASPLOS VII.
[12] Josep Llosa,et al. Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs , 1997, ICS '97.
[13] Geoffrey C. Fox,et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..
[14] Corinna G. Lee,et al. Code optimizers and register organizations for vector architectures , 1992 .