Limited bandwidth to affect processor design

This paper quantifies and compares the performance impacts of memory latencies and finite bandwidth. We show that the implementation of aggressive latency tolerance techniques aggravates stalls due to finite memory bandwidth, which actually become more significant than stalls resulting from uncongested memory latency alone. We expect that memory bandwidth limitations across the processor pins will drive significant architectural change. An execution-driven simulation measures the time that several SPEC95 benchmarks spend stalled for memory latency, limited-memory bandwidth and computing.

[1]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS.

[2]  Gurindar S. Sohi,et al.  Instruction Issue Logic for High-Performance Interruptible, Multiple Functional Unit, Pipelines Computers , 1990, IEEE Trans. Computers.

[3]  David A. Wood,et al.  A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.

[4]  Arvin Park,et al.  Dynamic base register caching: a technique for reducing address bus width , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[5]  A. Seznec,et al.  Decoupled sectored caches: conciliating low tag implementation cost and low miss ratio , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[6]  André Seznec,et al.  Decoupled sectored caches: conciliating low tag implementation cost , 1994, ISCA '94.

[7]  John Paul Shen,et al.  A limit study of local memory requirements using value reuse profiles , 1995, MICRO 1995.

[8]  Nancy J. Warter-Perez,et al.  Modulo scheduling with multiple initiation intervals , 1995, MICRO 28.

[9]  Gary S. Tyson,et al.  A modified approach to data cache management , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[10]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[11]  Larry Rudolph,et al.  Creating a wider bus using caching techniques , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[12]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[13]  D. Burger,et al.  Datascalar Architectures , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.