MoDe-X: Microarchitecture of a Layout-Aware Modular Decoupled Crossbar for On-Chip Interconnects

The number of cores in a single chip keeps increasing with process technology scaling, requiring a scalable interconnection network topology. Buffered wormhole-switched interconnect architectures are attractive for such multicore architectures. The 2D mesh on-chip interconnect provides a scalable, cost-efficient, flexible, and reliable next-generation interconnect topology in this context. In this paper, we provide a microarchitecture for a power and area efficient router for a 2D mesh interconnect. We propose an efficient crossbar implementation, called MoDe-X, that uses a reasonable power-performance tradeoff. The MoDe-X router uses a Modular-Decoupled Crossbar (MoDe-X) that incorporates dimensional decomposition and segmentation to achieve power and area savings. However, unlike most prior work that considers only logical representation of the crossbars, MoDe-X is a physically aware router accounting for the actual layout of router components to reflect practical design requirements. Our simulation results and power estimate show that the MoDe-X router architectures can reduce the overall router area by up to 40 percent and power consumption by up to 35 percent with very little performance impact that occurs only at higher loads. Further, by applying aggressive power gating techniques the net power reductions can be as much as 99 percent for some workloads with no additional performance impact.

[1]  Sailesh Kottapalli,et al.  Westmere-EX: A 20 thread server CPU , 2010, 2010 IEEE Hot Chips 22 Symposium (HCS).

[2]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.

[3]  M. Suzuoki,et al.  Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor , 2006, IEEE Journal of Solid-State Circuits.

[4]  Sharad Malik,et al.  Power-driven design of router microarchitectures in on-chip networks , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[5]  Akhilesh Kumar,et al.  Integration Challenges and Tradeoffs for Tera-scale Architectures I l ® chnology , 2007 .

[6]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[7]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[8]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[9]  Shasi Kumar,et al.  A 2Tb/s 6×4 mesh network with DVFS and 2.3Tb/s/W router in 45nm CMOS , 2010, 2010 Symposium on VLSI Circuits.

[10]  Donglai Dai,et al.  On-chip Interconnect Trade-offs for Tera-scale Many-core Processors , 2010 .

[11]  Ran Ginosar,et al.  QNoC asynchronous router , 2009, Integr..

[12]  Jens Sparsø,et al.  A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip , 2005, Design, Automation and Test in Europe.

[13]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[14]  Timothy Mark Pinkston,et al.  Crossbar analysis for optimal deadlock recovery router architecture , 1997, Proceedings 11th International Parallel Processing Symposium.

[15]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[16]  Chita R. Das,et al.  ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[17]  R. Chau,et al.  A 45nm Logic Technology with High-k+Metal Gate Transistors, Strained Silicon, 9 Cu Interconnect Layers, 193nm Dry Patterning, and 100% Pb-free Packaging , 2007, 2007 IEEE International Electron Devices Meeting.

[18]  Chita R. Das,et al.  MIRA: A Multi-layered On-Chip Interconnect Router Architecture , 2008, 2008 International Symposium on Computer Architecture.

[19]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[20]  Kuochen Wang,et al.  Design and implementation of fault-tolerant and cost effective crossbar switches for multiprocessor systems , 1999 .

[21]  Stephen B. Furber,et al.  An asynchronous on-chip network router with quality-of-service (QoS) support , 2004, IEEE International SOC Conference, 2004. Proceedings..

[22]  Yuval Tamir,et al.  High-performance multiqueue buffers for VLSI communication switches , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[23]  Chita R. Das,et al.  LAPSES: a recipe for high performance adaptive router design , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[24]  William J. Dally,et al.  Microarchitecture of a high radix router , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[25]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[26]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[27]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[28]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.