论文信息 - A technique for high bandwidth and deterministic low latency load/store accesses to multiple cache banks

A technique for high bandwidth and deterministic low latency load/store accesses to multiple cache banks

One of the problems in future processors will be the resource conflicts caused by several load/store units competing to access the same cache bank. The traditional approach for handling this case is by introducing buffers combined with a cross-bar. This approach suffers from (i) the non-deterministic latency of a load/store and (ii) the extra latency caused by the cross-bar and the buffer management. A deterministic latency is of the utmost importance for the forwarding mechanism of out-of-order processors because it enables back-to-back operation of instructions. We propose a technique by which we eliminate the buffers and cross-bars from the critical path of the load/store execution. This results in both, a low and a deterministic latency. Our solution consists of predicting which bank is to be accessed. Only in the case of a wrong prediction a penalty results.

Koen De Bosschere | Henk Neefs | Hans Vandierendonck

[1] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[2] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[3] Trevor Mudge,et al. Performance optimization of pipelined primary cache , 1992, ISCA '92.

[4] Jean-Loup Baer,et al. Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.

[5] Uri C. Weiser,et al. Correlated load-address predictors , 1999, ISCA.

[6] Joseph I. Chamdani,et al. Low load latency through sum-addressed memory (SAM) , 1998, ISCA.

[7] Kunle Olukotun,et al. Performance Optimization of Pipelined Primary Caches , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[8] Dirk Grunwald,et al. Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[9] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[10] James E. Smith,et al. The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[11] Sangyeun Cho,et al. Decoupling local variable accesses in a wide-issue superscalar processor , 1999, ISCA.

[12] Stéphan Jourdan,et al. Speculation techniques for improving load related instruction scheduling , 1999, ISCA.

[13] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.

[14] Dionisios N. Pnevmatikatos,et al. Streamlining data cache access with fast address calculation , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.