Reconfigurable hybrid interconnection for static and dynamic scientific applications

As we enter the era of peta-scale computing, system architects must plan for machines composed of tens or even hundreds of thousands of processors. Although fully connected networks such as fat-tree configurations currently dominate HPC interconnect designs, such approaches are inadequate for ultra-scale concurrencies due to the superlinear growth of component costs. Traditional low-degree interconnect topologies, such as 3D tori, have reemerged as a competitive solution due to the linear scaling of system components relative to the node count; however, such networks are poorly suited for the requirements of many scientific applications at extreme concurrencies. To address these limitations, we propose HFAST, a hybrid switch architecture that uses circuit switches to dynamically reconfigure lower-degree interconnects to suit the topological requirements of a given scientific application. This work presents several new research contributions. We develop an optimization strategy for HFAST mappings and demonstrate that efficiency gains can be attained across a broad range of static numerical computations. Additionally, we conduct an extensive analysis of the communication characteristics of a dynamically adapting mesh calculation and show that the HFAST approach can achieve significant advantages, even when compared with traditional fat-tree configurations. Overall results point to the promising potential of utilizing hybrid reconfigurable networks to interconnect future peta-scale architectures, for both static and dynamically adapting applications.

[1]  Michael Lang,et al.  A Performance and Scalability Analysis of the BlueGene/L Architecture , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[2]  R. Luijten,et al.  An Optical Packet-Switched Interconnect for Supercomputer Applications ∗ , 2004 .

[3]  Z. Lin,et al.  Size scaling of turbulent transport in magnetically confined plasmas. , 2002, Physical review letters.

[4]  Burkhard Monien The complexity of embedding graphs into binary trees , 1985, FCT.

[5]  Joel Mambretti,et al.  Optical Switching Middleware for the OptlPuter , 2003 .

[6]  J. Qiang,et al.  A parallel particle-in-cell model for beam-beam interaction in high energy ring colliders , 2004 .

[7]  Leonid Oliker,et al.  Integrated performance monitoring of a cosmology application on leading HEC platforms , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[8]  John B. Bell,et al.  Parallelization of structured, hierarchical adaptive mesh refinement algorithms , 2000 .

[9]  Philip Heidelberger,et al.  Optimizing task layout on the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..

[10]  Leonid Oliker,et al.  Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems , 2006, VECPAR.

[11]  Rami G. Melhem,et al.  On the Feasibility of Optical Circuit Switching for High Performance Computing Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[12]  Leonid Oliker,et al.  Scientific Computations on Modern Parallel Vector Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[13]  Leonid Oliker,et al.  Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[14]  J. Haas,et al.  Interaction of weak shock waves with cylindrical and spherical gas inhomogeneities , 1987, Journal of Fluid Mechanics.

[15]  Vipul Gupta,et al.  Performance analysis of a synchronous, circuit-switched interconnection cached network , 1994, ICS '94.

[16]  Linda Vahala,et al.  Lattice Boltzmann Model for Dissipative Incompressible MHD , 2001 .