Improving Task Placement for Applications with 2 D , 3 D , and 4 D Virtual Cartesian Topologies on 3 D Torus Networks with Service Nodes

We describe two new methods for mapping applications with multidimensional virtual Cartesian process topologies onto 3D torus networks with randomly distributed service nodes. The first method, “Adaptive Layout”, works for any number of processes and distributes the MILC (lattice QCD, 4D topology) workload to ensure communicating processes are close together on the torus. This scheme reduces the run time by 2.7X compared to default placement. The second method, “Topaware”, selects a prism of nodes slightly larger than the ideal prism one would select if there were no service nodes. The application’s processes are ordered to group neighboring processes on the same node and to place groups of neighbors onto nodes which are no more than a few hops apart. Up to 40% run time reductions are obtained for 2D and 3D virtual topologies. In dedicated mode, using Topaware with MILC reduces the run time by 3.7X compared to default placement. Keywords—topology awareness, task placement, torus

[1]  Torsten Hoefler,et al.  Generic topology mapping strategies for large-scale parallel architectures , 2011, ICS '11.

[2]  William Daughton,et al.  Advances in petascale kinetic plasma simulation with VPIC and Roadrunner , 2009 .

[3]  G. Powers,et al.  A Description of the Advanced Research WRF Version 3 , 2008 .

[4]  Ray W. Grout,et al.  Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Larry Kaplan,et al.  The Gemini System Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[6]  Torsten Hoefler Analyses and Modeling of Applications Used to Demonstrate Sustained Petascale Performance on Blue Waters , 2012 .

[7]  A. Weinberg,et al.  Oak Ridge National Laboratory. , 1949, Science.

[8]  José E. Moreira,et al.  Topology Mapping for Blue Gene/L Supercomputer , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[9]  Improving the Performance of the PSDNS PseudoSpectral Turbulence Application on Blue Waters using Coarray Fortran and Task Placement , 2013 .

[10]  D. Roweth,et al.  Leveraging the Cray Linux Environment Core Specialization Feature to Realize MPI Asynchronous Progress on Cray XE Systems , 2012 .

[11]  Katie Antypas Running Large Scale Jobs on a Cray XE 6 System , 2012 .

[12]  Hugo Mills,et al.  Scalable Node Allocation for Improved Performance in Regular and Anisotropic 3D Torus Supercomputers , 2011, EuroMPI.

[13]  Cory Spitz,et al.  Minimizing Lustre Ping Effects at Scale on Cray Systems , 2012 .

[14]  Jeroen Tromp,et al.  High-frequency simulations of global seismic wave propagation using SPECFEM3D_GLOBE on 62K processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.