Designing an exascale interconnect using multi-objective optimization

Exascale performance will be delivered by systems composed of millions of interconnected computing cores. The way these computing elements are connected with each other (network topology) has a strong impact on many performance characteristics. In this work we propose a multi-objective optimization-based framework to explore possible network topologies to be implemented in the EU-funded ExaNeSt project. The modular design of this system's interconnect provides great flexibility to design topologies optimized for specific performance targets such as communications locality, fault tolerance or energy-consumption. The generation procedure of the topologies is formulated as a three-objective optimization problem (minimizing some topological characteristics) where solutions are searched using evolutionary techniques. The analysis of the results, carried out using simulation, shows that the topologies meet the required performance objectives. In addition, a comparison with a well-known topology reveals that the generated solutions can provide better topological characteristics and also higher performance for parallel applications.

[1]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[2]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[3]  Nicola Beume,et al.  SMS-EMOA: Multiobjective selection based on dominated hypervolume , 2007, Eur. J. Oper. Res..

[4]  Philip Heidelberger,et al.  The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[6]  Marco Laumanns,et al.  SPEA2: Improving the Strength Pareto Evolutionary Algorithm For Multiobjective Optimization , 2002 .

[7]  Eckart Zitzler,et al.  HypE: An Algorithm for Fast Hypervolume-Based Many-Objective Optimization , 2011, Evolutionary Computation.

[8]  Mike Higgins,et al.  Cray Cascade: A scalable HPC system based on a Dragonfly network , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Javier Navaridas,et al.  High-Performance, Low-Complexity Deadlock Avoidance for Arbitrary Topologies/Routings , 2018, ICS.

[10]  Javier Navaridas,et al.  Simulating and evaluating interconnection networks with INSEE , 2011, Simul. Model. Pract. Theory.

[11]  Hua Xu,et al.  An improved NSGA-III procedure for evolutionary many-objective optimization , 2014, GECCO.

[12]  Javier Navaridas,et al.  A CAM-Free Exascalable HPC Router for Low-Energy Communications , 2018, ARCS.

[13]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[14]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[15]  Ibm Blue,et al.  Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..

[16]  Pier Stanislao Paolucci,et al.  The Next Generation of Exascale-Class Systems: The ExaNeSt Project , 2017, 2017 Euromicro Conference on Digital System Design (DSD).

[17]  Scott Hauck,et al.  Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation , 2007 .

[18]  Y. Zhang,et al.  The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems , 2016, 2016 Euromicro Conference on Digital System Design (DSD).