LFTI: A New Performance Metric for Assessing Interconnect Designs for Extreme-Scale HPC Systems

Traditionally, interconnect performance is either characterized by simple topological parameters such as bisection bandwidth or studied through simulation that gives detailed performance information for the scenarios simulated. Neither of these approaches provides a good performance overview for extreme-scale interconnects. The topological parameters are not directly related to application level communication performance while the simulation complexity limits the number of scenarios that can be investigated. In this work, we propose a new performance metric, called LANL-FSU Throughput Indices (LFTI), for characterizing the throughput performance of interconnect designs. LFTI combines the simplicity of topological parameters and the accuracy of simulation: like topological parameters, LFTI can be derived from interconnect specification, at the same time, it directly reflects the application level communication performance. Moreover, in cases when the theoretical throughput for each communication pattern can be modeled efficiently for an interconnect, LFTI for the interconnect can be computed efficiently. These features potentially allow LFTI to be used for rapid and comprehensive evaluation and comparison of extreme-scale interconnect designs. We demonstrate the effectiveness of LFTI by using it to evaluate and explore the design space of a number of large-scale interconnect designs.

[1]  Nan Jiang,et al.  Indirect adaptive routing on large scale interconnection networks , 2009, ISCA '09.

[2]  William J. Dally,et al.  Adaptive Routing in High-Radix Clos Network , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[3]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[4]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[5]  A. Charny,et al.  An Algorithm for Rate Allocation in a Packet-Switching Network With Feedback , 1994 .

[6]  Mateo Valero,et al.  Oblivious routing schemes in extended generalized Fat Tree networks , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[7]  Fabio Checconi,et al.  Characterization of the Communication Patterns of Scientific Applications on Blue Gene/P , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[8]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[9]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[10]  Xin Yuan,et al.  A new routing scheme for jellyfish and its performance with HPC workloads , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[11]  Darren J. Kerbyson,et al.  Automatic Identification of Application Communication Patterns via Templates , 2005, ISCA PDCS.

[12]  Toshiyuki Shimizu,et al.  Tofu: A 6D Mesh/Torus Interconnect for Exascale Computers , 2009, Computer.

[13]  R. Srikant,et al.  Multi-Path TCP: A Joint Congestion Control and Routing Scheme to Exploit Path Diversity in the Internet , 2006, IEEE/ACM Transactions on Networking.

[14]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[15]  Dimitri P. Bertsekas,et al.  Data Networks , 1986 .

[16]  Torsten Hoefler,et al.  Multistage switches are not crossbars: Effects of static routing in high-performance networks , 2008, 2008 IEEE International Conference on Cluster Computing.