Understanding Performance Variability on the Aries Dragonfly Network

This work evaluates performance variability in the Cray Aries dragonfly network and characterizes its impact on MPI Allreduce. The execution time of Allreduce is limited by the performance of the slowest participating process, which can vary by more than an order of magnitude. We utilize counters from the network routers to provide a better understanding of how competing workloads can influence performance. Specifically, we examine the relationships between message size, process counts, Aries counters and the Allreduce communication-time. Our results suggest that competing traffic from other jobs can significantly impact performance on the Aries Dragonfly Network. Furthermore, we show that Aries network counters are a valuable tool, explaining up to 70% of the performance variability for our experiments on a large-scale production system.

[1]  D. Roweth,et al.  Cray XC ® Series Network , 2012 .

[2]  Robert B. Ross,et al.  Evaluation of Topology-Aware Broadcast Algorithms for Dragonfly Networks , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[3]  Simon D. Hammond,et al.  (SAI) Stalled, Active and Idle: Characterizing Power and Performance of Large-Scale Dragonfly Networks , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[4]  Robert B. Ross,et al.  Using massively parallel simulation for mpi collective communication modeling in extreme-scale networks , 2014, Proceedings of the Winter Simulation Conference 2014.

[5]  Torsten Hoefler,et al.  Exploring the effect of noise on the performance benefit of nonblocking allreduce , 2014, EuroMPI/ASIA.

[6]  Allen D. Malony,et al.  The ghost in the machine: observing the effects of kernel operation on parallel application performance , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[7]  Laxmikant V. Kalé,et al.  Evaluating HPC Networks via Simulation of Parallel Workloads , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Kevin T. Pedretti,et al.  The impact of system design parameters on application noise sensitivity , 2010, 2010 IEEE International Conference on Cluster Computing.

[9]  Kevin T. Pedretti,et al.  Overtime: a tool for analyzing performance variation due to network interference , 2015, ExaMPI '15.

[10]  Torsten Hoefler,et al.  Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Susan Coghlan,et al.  The Influence of Operating Systems on the Performance of Collective Operations at Extreme Scale , 2006, 2006 IEEE International Conference on Cluster Computing.

[12]  Ron Brightwell,et al.  Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Valerio Pascucci,et al.  Analyzing Network Health and Congestion in Dragonfly-Based Supercomputers , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[14]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[15]  A. Gentile,et al.  Network Performance Counter Monitoring and Analysis on the Cray XC Platform. , 2016 .

[16]  Dorian C. Arnold,et al.  A LogP Extension for Modeling Tree Aggregation Networks , 2015, 2015 IEEE International Conference on Cluster Computing.

[17]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).