Interconnect-limited VLSI architecture

As semiconductor technology scales, wires are becoming the dominant factor in determining system performance and power dissipation. By 2008, it is expected that chip traversal will require 16 clocks. Modern superscalar architectures that depend on global register files, global bypass structures, and global instruction issue logic are poorly matched to future semiconductor technology. This technology demands architectures that exploit locality and minimize global communication. In this paper, we describe three approaches to developing architectures that are well matched to interconnect-limited technology. These architectures reduce the use of global communication by clustering execution resources with their data and instruction storage and extending the storage hierarchy to the level of individual ALUs. They also make more efficient use of global interconnection by organizing it as a regular network, rather than a collection of ad-hoc dedicated wires.

[1]  William J. Dally,et al.  A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  William J. Dally,et al.  Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[3]  William J. Dally,et al.  The M-machine multicomputer , 1997, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[4]  William J. Dally,et al.  Processor coupling: integrating compile time and runtime scheduling for parallelism , 1992, ISCA '92.

[5]  Charles L. Seitz,et al.  Let's route packets instead of wires , 1990 .

[6]  B. Ramakrishna Rau,et al.  The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.