On effective data supply for multi-issue processors

Emerging multi-issue microprocessors require effective data supply to sustain multiple instruction processing. The data cache structure, the backbone of data supply, has been organized and managed as one large homogenous resource, offering little flexibility for selective caching. While memory latency hiding techniques and multi-ported caches are critical to effective data supply, we show in this paper that even ideal non-blocking multi-ported caches fail to be sufficient in and of themselves in supplying data. We evaluate an approach in which the first level (L1) data cache is partitioned into multiple (multi-lateral) subcaches. The data reference stream of a running program is subdivided into two classes, and each class is mapped to a specific subcache whose management policy is more suitable for the access pattern of its class. This sort of selective organization and caching retains more useful data in the L1 Cache, which translates to more cache hits, less cache-memory bus contention and overall improvement in execution time. Our simulations show that a multi-lateral L1 cache of (8+1)KB total size generally performs as well as, and in some cases better than, an ideal multiported 16 KB cache structure in supplying data.

[1]  Burzin A. Patel,et al.  Optimization of instruction fetch mechanisms for high issue rates , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[2]  Yale N. Patt,et al.  Increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 1993, ICS '93.

[3]  Gary S. Tyson,et al.  A modified approach to data cache management , 1995, MICRO 1995.

[4]  Guang R. Gao,et al.  A design framework for hybrid-access caches , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[5]  R. Rajamani,et al.  A CMOS RISC CPU with on-chip parallel cache , 1994, Proceedings of IEEE International Solid-State Circuits Conference - ISSCC '94.

[6]  Wen-Hann Wang,et al.  On the inclusion properties for multi-level cache hierarchies , 1988, ISCA '88.

[7]  Wen-Hann Wang,et al.  On the Inclusion Properties for Multi-Level Cache Hierarchies , 1988, ISCA.

[8]  Nancy Warter-Perez,et al.  Modulo scheduling with multiple initiation intervals , 1995, MICRO 1995.

[9]  Gurindar S. Sohi,et al.  High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.

[10]  Gurindar S. Sohi,et al.  High-bandwidth data memory systems for superscalar processors , 1991 .

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  Edward S. Davidson,et al.  Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[13]  Scott McFarling,et al.  Cache Replacement with Dynamic Exclusion , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[14]  Yale N. Patt,et al.  Retrospective: alternative implementations of two-level adaptive training branch prediction , 1998, ISCA '98.

[15]  Edward S. Davidson,et al.  The resource conflict methodology for early-stage design space exploration of superscalar RISC processors , 1995, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors.

[16]  S. McFarling Combining Branch Predictors , 1993 .

[17]  Ken Chan,et al.  PA7200: a PA-RISC processor with integrated high performance MP bus interface , 1994, Proceedings of COMPCON '94.

[18]  Edward S. Davidson,et al.  Early Design Cycle Timing Simulation of Caches , 1996 .

[19]  Walid A. Najjar,et al.  An evaluation of bottom-up and top-down thread generation techniques , 1993, MICRO 1993.

[20]  Mark D. Hill,et al.  A case for direct-mapped caches , 1988, Computer.