Energy-guided exploration of on-chip network design for exa-scale computing

Designing energy-efficient systems under tight performance and energy constraints becomes increasingly challenging for exascale computing. In particular, interconnecting hundreds of cores, caches, integrated memory and I/O controllers in energy efficient way stands out as a new challenge. This paper proposes hierarchical on-chip networks that take the proximity advantage between the cores in smaller clusters as a promising approach toward energy-efficient high performance computing. The design trade-offs of hierarchical interconnect architectures are studied using a fast and scalable design space exploration tool for exascale systems with number of cores in the order of thousands. In particular, we consider a system with 720 processing nodes and two-level network hierarchy. By supporting both traditional cache-based memory model and scratch pad memory (SPM) model, the target system architecture proves to be a good testbed for energy-guided exploration of hierarchical networks.

[1]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[2]  Christoforos E. Kozyrakis,et al.  Models and Metrics to Enable Energy-Efficiency Optimizations , 2007, Computer.

[3]  Chita R. Das,et al.  Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[4]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[5]  Zeshan Chishti,et al.  Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[6]  Shasi Kumar,et al.  A 2Tb/s 6×4 mesh network with DVFS and 2.3Tb/s/W router in 45nm CMOS , 2010, 2010 Symposium on VLSI Circuits.

[7]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[8]  B. Jacob,et al.  CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .

[9]  Saurabh Dighe,et al.  A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling , 2011, IEEE Journal of Solid-State Circuits.

[10]  Nicolas Ventroux,et al.  Hierarchical Network-on-Chip for Embedded Many-Core Architectures , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[11]  Partha Pratim Pande,et al.  Performance evaluation and design trade-offs for network-on-chip interconnect architectures , 2005, IEEE Transactions on Computers.

[12]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[13]  Guang R. Gao,et al.  A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[14]  Natalie D. Enright Jerger,et al.  Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.

[15]  J. Koomey Worldwide electricity used in data centers , 2008 .

[16]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[17]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[18]  Philip Heidelberger,et al.  Blue Gene/L torus interconnection network , 2005, IBM J. Res. Dev..

[19]  Luca Benini,et al.  Analysis of power consumption on switch fabrics in network routers , 2002, DAC '02.