论文信息 - Increasing cache bandwidth using multi-port caches for exploiting ILP in non-numerical code

Increasing cache bandwidth using multi-port caches for exploiting ILP in non-numerical code

Modern microprocessors that exploit instruction-level parallelism (ILP) require higher data cache bandwidth than sequential machines, since at least the same number of cache references (or more due to speculative execution) must be made in fewer clock cycles. To support the required bandwidth, multiport caches have been proposed, allowing the execution of multiple load/store instructions in a single cycle. Deciding the minimal number of cache ports that do not deeply affect performance is one of major resource allocation problems of ILP machines. Unfortunately, this decision is difficult to make for machines that exploit ILP in non-numerical code due to its irregularity. In this short paper, a comprehensive empirical study is performed on selected integer benchmarks using an aggressive ILP compiler aimed at characterising a suitable number of cache ports and at evaluating the metric of cache bandwidth. This study differs from previous ones in that the compiler can successfully exploit ILP in proportion to the amount of resources, thus measuring the performance impact of multiport caches more accurately. The results indicate that multiport caches that provide high bandwidth significantly improve the performance of ILP machines (i.e. as much as a geometric mean of 50%), yet there is a consistent upper bound on the number of required ports.

Soo-Mook Moon

[1] Thomas M. Conte. Tradeoffs in processor/memory interfaces for superscalar processors , 1992, MICRO 1992.

[2] B. Nadeau-Dostie,et al. A 200 Mhz 0.8μm BiCMOS Modular Memory Family Of DRAM And Multiport SRAM , 1992, 1992 Proceedings of the IEEE Custom Integrated Circuits Conference.

[3] Kemal Ebcioglu,et al. A study on the number of memory ports in multiple instruction issue machines , 1993, MICRO 1993.

[4] Dionisios N. Pnevmatikatos,et al. Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.

[5] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.

[6] Jeff Yetter,et al. Performance features of the PA7100 microprocessor , 1993, IEEE Micro.

[7] Mike Johnson,et al. Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[8] Soo-Mook Moon. Compile-time parallelization of non-numerical code: VLIW superscalar , 1993 .

[9] Andrew Wolfe,et al. Two-ported cache alternatives for superscalar processors , 1993, MICRO 1993.

[10] Soo-Mook Moon,et al. Generalized Multiway Branch Unit for VLIW Microprocessors , 1995, IEEE Trans. Parallel Distributed Syst..