Design Exploration of Multi-tier Interconnection Networks for Exascale Systems

Interconnection networks are one of the main limiting factors when it comes to scale out computing systems. In this paper, we explore what role the hybridization of topologies has on the design of an state-of-the-art exascale-capable computing system. More precisely we compare several hybrid topologies and compare with common single-topology ones when dealing with large-scale applicationlike traffic. In addition we explore how different aspects of the hybrid topology can affect the overall performance of the system. In particular, we found that hybrid topologies can outperform state-of-the-art torus and fattree networks as long as the density of connections is high enough--one connection every two or four nodes seems to be the sweet spot--and the size of the subtori is limited to a few nodes per dimension. Moreover, we explored two different alternatives to use in the upper tiers of the interconnect, a fattree and a generalised hypercube, and found little difference between the topologies, mostly depending on the workload to be executed.

[1]  Y. Zhang,et al.  The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems , 2016, 2016 Euromicro Conference on Digital System Design (DSD).

[2]  Keith D. Underwood,et al.  Initial performance evaluation of the Cray SeaStar interconnect , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[3]  Ibm Blue,et al.  Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..

[4]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[5]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[6]  Iain A. Stewart,et al.  The stellar transformation: From interconnection networks to datacenter networks , 2015, Comput. Networks.

[7]  Iain A. Stewart,et al.  An Optimal Single-Path Routing Algorithm in the Datacenter Network DPillar , 2015, IEEE Transactions on Parallel and Distributed Systems.

[8]  Javier Navaridas,et al.  A CAM-Free Exascalable HPC Router for Low-Energy Communications , 2018, ARCS.

[9]  Jesús Escudero-Sahuquillo,et al.  High-performance interconnection networks in the Exascale and Big-Data Era , 2017, 2017 IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB).

[10]  Mrinmoy Ghosh,et al.  Performance analysis of NVMe SSDs and their implication on real world databases , 2015, SYSTOR.

[11]  Sudhakar Yalamanchili,et al.  Adaptive routing in generalized hypercube architectures , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[12]  Rajeev Thakur,et al.  Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Iain A. Stewart,et al.  INRFlow: An interconnection networks research flow-level simulation framework , 2019, J. Parallel Distributed Comput..

[15]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[16]  Michael Lang,et al.  Implementation and performance modeling of deterministic particle transport (Sweep3D) on the IBM Cell/B.E , 2009, Sci. Program..

[17]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[18]  Tomohiro Inoue,et al.  The Tofu Interconnect , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.

[19]  Fabrizio Petrini,et al.  k-ary n-trees: high performance networks for massively parallel architectures , 1997, Proceedings 11th International Parallel Processing Symposium.

[20]  Mike Higgins,et al.  Cray Cascade: A scalable HPC system based on a Dragonfly network , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[22]  Dharma P. Agrawal,et al.  Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.

[23]  Hideharu Amano,et al.  Recursive Diagonal Torus: An Interconnection Network for Massively Parallel Computers , 2001, IEEE Trans. Parallel Distributed Syst..

[24]  Lixin Gao,et al.  DPillar: Dual-port server interconnection network for large scale data centers , 2012, Comput. Networks.

[25]  PlimptonSteve Fast parallel algorithms for short-range molecular dynamics , 1995 .

[26]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[27]  Yunhao Liu,et al.  Expandable and Cost-Effective Network Structures for Data Centers Using Dual-Port Servers , 2013, IEEE Transactions on Computers.

[28]  Haitao Wu,et al.  Scalable and Cost-Effective Interconnection of Data-Center Servers Using Dual Server Ports , 2011, IEEE/ACM Transactions on Networking.

[29]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[30]  Javier Navaridas,et al.  Low latency network and distributed storage for next generation HPC systems: the ExaNeSt project , 2017 .

[31]  Pier Stanislao Paolucci,et al.  The Next Generation of Exascale-Class Systems: The ExaNeSt Project , 2017, 2017 Euromicro Conference on Digital System Design (DSD).

[32]  Olav Lysne,et al.  Interconnection Networks: Architectural Challenges for Utility Computing Data Centers , 2008, Computer.

[33]  Davide Rossetti,et al.  APEnet+: a 3D Torus network optimized for GPU-based HPC Systems , 2012 .

[34]  Peter M. Kogge,et al.  [2010] Facing the Exascale Energy Wall , 2010, 2010 International Workshop on Innovative Architecture for Future Generation High Performance.

[35]  Dennis Abts,et al.  A Guided Tour through Data-center Networking , 2012, ACM Queue.

[36]  Peter M. Kogge,et al.  Facing the Exascale Energy Wall. , 2010 .

[37]  Philip Heidelberger,et al.  The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[38]  Javier Navaridas,et al.  Reducing complexity in tree-like computer interconnection networks , 2010, Parallel Comput..

[39]  George L.-T. Chiu,et al.  Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..