Topology-aware job mapping

A Resource and Job Management System (RJMS) is a crucial system software part of the HPC stack. It is responsible for efficiently delivering computing power to applications in supercomputing environments. Its main intelligence relies on resource selection techniques to find the most adapted resources to schedule the users’ jobs. This article introduces a new method that takes into account the topology of the machine and the application characteristics to determine the best choice among the available nodes of the platform, based upon the network topology and taking into account the application communication pattern. To validate our approach, we integrate this algorithm as a plugin for Simple Linux Utility for Resource Management (SLURM), a well-known and widespread RJMS. We assess our plugin with different optimization schemes by comparing with the default topology-aware Slurm algorithm, using both emulation and simulation of a large-scale platform and by carrying out experiments in a real cluster. We show that transparently taking into account a job communication pattern and the topology allows for relevant performance gains.

[1]  Jingjin Wu,et al.  Hierarchical task mapping of cell-based AMR cosmology simulations , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Emmanuel Jeannot,et al.  Improving MPI Applications Performance on Multicore Clusters with Rank Reordering , 2011, EuroMPI.

[3]  Emmanuel Jeannot,et al.  Process Placement in Multicore Clusters:Algorithmic Issues and Practical Techniques , 2014, IEEE Transactions on Parallel and Distributed Systems.

[4]  Guillaume Mercier,et al.  hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[5]  Emmanuel Jeannot,et al.  Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures , 2010, Euro-Par.

[6]  Jia Wang,et al.  Balancing job performance with system performance via locality-aware scheduling on torus-connected systems , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[7]  Torsten Hoefler,et al.  Generic topology mapping strategies for large-scale parallel architectures , 2011, ICS '11.

[8]  Jingjin Wu,et al.  Hierarchical task mapping for parallel applications on supercomputers , 2015, The Journal of Supercomputing.

[9]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[10]  George Bosilca,et al.  Online Dynamic Monitoring of MPI Communications , 2017, Euro-Par.

[11]  Philippe Olivier Alexandre Navaux,et al.  An Efficient Algorithm for Communication-Based Task Mapping , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[12]  Yiannis Georgiou,et al.  Evaluating Scalability and Efficiency of the Resource and Job Management System on Large HPC Clusters , 2012, JSSPP.

[13]  Daniel J. Palermo,et al.  Enhancing an Open Source Resource Manager with Multi-core/Multi-threaded Support , 2007, JSSPP.

[14]  Dirk Beyer,et al.  Policy-Based Resource Assignment in Utility Computing Environments , 2004, DSOM.

[15]  Chuang Liu,et al.  Design and evaluation of a resource selection framework for Grid applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[16]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[17]  Guillaume Mercier,et al.  Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments , 2009, PVM/MPI.

[18]  Georges Da Costa,et al.  2005 IEEE International Symposium on Cluster Computing and the Grid , 2005, CCGRID.

[19]  Javier Navaridas,et al.  Reducing complexity in tree-like computer interconnection networks , 2010, Parallel Comput..

[20]  Hugo Mills,et al.  Scalable Node Allocation for Improved Performance in Regular and Anisotropic 3D Torus Supercomputers , 2011, EuroMPI.

[21]  Sandia Report,et al.  MiniGhost: A Miniapp for Exploring Boundary Exchange Strategies Using Stencil Computations in Scientific Parallel Computing , 2012 .

[22]  Randy H. Katz,et al.  Topology-aware resource allocation for data-intensive workloads , 2011, Comput. Commun. Rev..

[23]  Laxmikant V. Kalé,et al.  Topology aware task mapping techniques: an api and case study , 2009, PPoPP '09.

[24]  Jonathan Green,et al.  Multi-core and Network Aware MPI Topology Functions , 2011, EuroMPI.