A novel dimensionally-decomposed router for on-chip communication in 3D architectures

Much like multi-storey buildings in densely packed metropolises, three-dimensional (3D) chip structures are envisioned as a viable solution to skyrocketing transistor densities and burgeoning die sizes in multi-core architectures. Partitioning a larger die into smaller segments and then stacking them in a 3D fashion can significantly reduce latency and energy consumption. Such benefits emanate from the notion that inter-wafer distances are negligible compared to intra-wafer distances. This attribute substantially reduces global wiring length in 3D chips. The work in this paper integrates the increasingly popular idea of packet-based Networks-on-Chip (NoC) into a 3D setting. While NoCs have been studied extensively in the 2D realm, the microarchitectural ramifications of moving into the third dimension have yet to be fully explored. This paper presents a detailed exploration of inter-strata communication architectures in 3D NoCs. Three design options are investigated; a simple bus-based inter-wafer connection, a hop-by-hop standard 3D design, and a full 3D crossbar implementation. In this context, we propose a novel partially-connected 3D crossbar structure, called the 3D Dimensionally-Decomposed (DimDe) Router, which provides a good tradeoff between circuit complexity and performance benefits. Simulation results using (a) a stand-alone cycle-accurate 3D NoC simulator running synthetic workloads, and (b) a hybrid 3D NoC/cache simulation environment running real commercial and scientific benchmarks, indicate that the proposed DimDe design provides latency and throughput improvements of over 20% on average over the other 3D architectures, while remaining within 5% of the full 3D crossbar performance. Furthermore, based on synthesized hardware implementations in 90 nm technology, the DimDe architecture outperforms all other designs -- including the full 3D crossbar -- by an average of 26% in terms of the Energy-Delay Product (EDP).

[1]  Kaustav Banerjee,et al.  Introspective 3D chips , 2006, ASPLOS XII.

[2]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[3]  Li Shang,et al.  In-Network Cache Coherence , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[4]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[5]  William J. Dally,et al.  Microarchitecture of a high radix router , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[6]  Jason Cong,et al.  Thermal via planning for 3-D ICs , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[7]  Kenneth Rose,et al.  First-order performance prediction of cache memory with wafer-level 3D integration , 2005, IEEE Design & Test of Computers.

[8]  Sachin Sapatnekar,et al.  Efficient Thermal Placement of Standard Cells in 3D ICs using a Force Directed Approach , 2003, ICCAD 2003.

[9]  Narayanan Vijaykrishnan,et al.  Interconnect and thermal-aware floorplanning for 3D microprocessors , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[10]  Li Shang,et al.  Thermal Modeling, Characterization and Management of On-Chip Networks , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[11]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[12]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[13]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[14]  William J. Dally,et al.  The torus routing chip , 2005, Distributed Computing.

[15]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[16]  Gabriel H. Loh,et al.  Implementing caches in a 3D technology for high performance processors , 2005, 2005 International Conference on Computer Design.

[17]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[18]  Jian Xu,et al.  Demystifying 3D ICs: the pros and cons of going vertical , 2005, IEEE Design & Test of Computers.

[19]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[20]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[21]  J. Meindl,et al.  Wafer-level microfluidic cooling interconnects for GSI , 2005, Proceedings of the IEEE 2005 International Interconnect Technology Conference, 2005..

[22]  William J. Dally,et al.  The J-machine network , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.

[23]  Krste Asanovic,et al.  Replacing global wires with an on-chip network: a power analysis , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[24]  Martin Burtscher,et al.  Bridging the processor-memory performance gap with 3D IC technology , 2005, IEEE Design & Test of Computers.

[25]  H. Sarbazi-Azad,et al.  Analytical modelling of wormhole-routed k-ary n-cubes in the presence of matrix-transpose traffic , 2003, J. Parallel Distributed Comput..

[26]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[27]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[28]  Mahmut T. Kandemir,et al.  Design and Management of 3D Chip Multiprocessors Using Network-in-Memory , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[29]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[30]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[31]  R. E. Kessler,et al.  Cray T3D: a new dimension for Cray Research , 1993, Digest of Papers. Compcon Spring.

[32]  William J. Bowhill,et al.  Design of High-Performance Microprocessor Circuits , 2001 .

[33]  NarayananVijaykrishnan,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006 .

[34]  Radu Marculescu Networks-on-chip: the quest for on-chip fault-tolerant communication , 2003, IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings..

[35]  Krisztián Flautner,et al.  PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor , 2006, ASPLOS XII.

[36]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.