Mitigating Inter-Job Interference Using Adaptive Flow-Aware Routing

On most high performance computing platforms, concurrently executing jobs share network resources. This sharing can lead to inter-job network interference, which can have a significant impact on the performance of communication-intensive applications. No satisfactory solutions yet exist for mitigating such performance degradation on systems that allow jobs to share the network for the sake of higher utilization. In this paper, we analyze network congestion caused by multi-job workloads on two production systems that use popular network topologies—fat-tree and dragonfly. For each system, we establish a regression model to relate network hotspots to application performance degradation. The models show that current routing strategies are ineffective at balancing network traffic and mitigating interference on production systems. We propose an alternative routing strategy, which we call adaptive flow-aware routing. We implement our strategy on a fat-tree system, and tests on the system show up to a 46% improvement in job run time when compared to the default routing.

[1]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[2]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[3]  A. B. Langdon,et al.  Filamentation and forward Brillouin scatter of entire smoothed and aberrated laser beams , 2000 .

[4]  Nicholas J. Wright,et al.  Understanding Performance Variability on the Aries Dragonfly Network , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[5]  Bronis R. de Supinski,et al.  The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Kevin Harms,et al.  Run-to-run Variability on Xeon Phi based Cray XC Systems , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Robert B. Ross,et al.  Watch Out for the Bully! Job Interference Study on Dragonfly Network , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Michael E. Papka,et al.  ALCF MPI Benchmarks: Understanding Machine-Specific Communication Behavior , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[9]  Laxmikant V. Kalé,et al.  Predicting application performance using supervised learning on communication features , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[10]  Jesús Labarta,et al.  Impact of Inter-application Contention in Current and Future HPC Systems , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[11]  Laxmikant V. Kalé,et al.  Identifying the Culprits Behind Network Congestion , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[12]  Nan Jiang,et al.  Network endpoint congestion control for fine-grained communication , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Nan Jiang,et al.  Network congestion avoidance through Speculative Reservation , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[14]  Eitan Zahavi D-Mod-K Routing Providing Non-Blocking Traffic for Shift Permutations on Real Life Fat Trees , 2010 .

[15]  Torsten Hoefler,et al.  Multistage switches are not crossbars: Effects of static routing in high-performance networks , 2008, 2008 IEEE International Conference on Cluster Computing.

[16]  Pedro López,et al.  Deterministic versus Adaptive Routing in Fat-Trees , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[17]  Zhou Tong,et al.  A comparative study of SDN and adaptive routing on dragonfly networks , 2017, SC.

[18]  Katherine E. Isaacs,et al.  There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[19]  Enhancing InfiniBand with OpenFlow-Style SDN Capability , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  Wu-chun Feng,et al.  IMPROVED RESOURCE UTILIZATION WITH BUFFERED COSCHEDULING , 2001, Parallel Algorithms Appl..

[21]  François Gygi,et al.  Architecture of Qbox: A scalable first-principles molecular dynamics code , 2008, IBM J. Res. Dev..

[22]  Hoefler Torsten,et al.  Scheduling-Aware Routing for Supercomputers , 2016 .

[23]  C. DeTar,et al.  Scaling tests of the improved Kogut-Susskind quark action , 1999, hep-lat/9912018.

[24]  A. Gentile,et al.  Network Performance Counter Monitoring and Analysis on the Cray XC Platform. , 2016 .

[25]  Sangeetha Abdu Jyothi,et al.  Measuring and Understanding Throughput of Network Topologies , 2014, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[26]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[27]  Mike Higgins,et al.  Cray Cascade: A scalable HPC system based on a Dragonfly network , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.