Widening resources: a cost-effective technique for aggressive ILP architectures
暂无分享,去创建一个
[1] Josep Llosa,et al. Resource widening versus replication: limits and performance-cost trade-off , 1998, ICS '98.
[2] Tadashi Watanabe. The NEC SX-3 supercomputer system , 1991, COMPCON Spring '91 Digest of Papers.
[3] Todd M. Austin,et al. High-Bandwidth Address Translation for Multiple-Issue Processors , 1996, ISCA.
[4] A. Gonzalez,et al. Hypernode reduction modulo scheduling , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[5] Josep Llosa,et al. Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs , 1997, ICS '97.
[6] Nikil D. Dutt,et al. Partitioned register files for VLIWs: a preliminary analysis of tradeoffs , 1992, MICRO 25.
[7] Geoffrey C. Fox,et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..
[8] Corinna G. Lee,et al. Code optimizers and register organizations for vector architectures , 1992 .
[9] B. Ramakrishna Rau,et al. Register allocation for software pipelined loops , 1992, PLDI '92.
[10] Norman P. Jouppi,et al. CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.
[11] Sam Harrell,et al. The national technology roadmap for semiconductors and SEMATECH future directions , 1996 .
[12] Norman P. Jouppi,et al. Memory-System Design Considerations for Dynamically-Scheduled Processors , 1997, ISCA.
[13] Steven W. White,et al. POWER2: Next generation of the RISC System/6000 family , 1994, IBM J. Res. Dev..
[14] Josep Llosa,et al. Heuristics for register-constrained software pipelining , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[15] R. D. Jolly,et al. A 9-ns, 1.4-gigabyte/s, 17-ported CMOS register file , 1991 .
[16] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.
[17] Kunle Olukotun,et al. The case for a single-chip multiprocessor , 1996, ASPLOS VII.
[18] Josep Llosa,et al. Modulo Scheduling with Reduced Register Pressure , 1998, IEEE Trans. Computers.
[19] B. Ramakrishna Rau,et al. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.
[20] P. Chow,et al. Memory-system Design Considerations For Dynamically-scheduled Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[21] Olivier Temam,et al. Data caches for superscalar processors , 1997, ICS '97.