Locality-aware policies to improve job scheduling on 3D tori

This paper studies the influence that contiguous job placement has on the performance of schedulers for large-scale computing systems. In contrast with non-contiguous strategies, contiguous partitioning enables the exploitation of communication locality in applications, and also reduces inter-application interference. However, contiguous partitioning increases scheduling times and system fragmentation, degrading system utilization. We propose and evaluate several strategies to select contiguous partitions to allocate incoming jobs. These strategies are used in combination with different mapping mechanisms to perform the task-to-node assignment in order to further reduce application run times. A simulation-based study has been carried out, using a collection of synthetic applications performing common communication patterns. Results show that the exploitation of communication locality by means of a correct partitioning–mapping results in an effective reduction of application run times, and the gains achieved more than compensate the scheduling inefficiency, therefore resulting in better overall system performance.

[1]  Shahid H. Bokhari,et al.  On the Mapping Problem , 1981, IEEE Transactions on Computers.

[2]  Uwe Schwiegelshohn,et al.  Parallel Job Scheduling - A Status Report , 2004, JSSPP.

[3]  Saad Bani-Mohammad,et al.  Comparative evaluation of contiguous allocation strategies on 3D mesh multicomputers , 2009, J. Syst. Softw..

[4]  Christopher R. Johnson,et al.  A Tie-Breaking Strategy for Processor Allocation in Meshes , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[5]  Javier Navaridas,et al.  On synthesizing workloads emulating MPI applications , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[6]  Javier Navaridas,et al.  Simulating and evaluating interconnection networks with INSEE , 2011, Simul. Model. Pract. Theory.

[7]  José Antonio Lozano,et al.  Optimization-based mapping framework for parallel applications , 2011, J. Parallel Distributed Comput..

[8]  Yaagoub Ashir,et al.  Lee Distance and Topological Properties of k-ary n-cubes , 1995, IEEE Trans. Computers.

[9]  Javier Navaridas,et al.  Effects of Topology-Aware Allocation Policies on Scheduling Performance , 2009, JSSPP.

[10]  Laxmikant V. Kalé,et al.  Benefits of Topology Aware Mapping for Mesh Interconnects , 2008, Parallel Process. Lett..

[11]  Laxmikant V. Kalé,et al.  Automated mapping of regular communication graphs on mesh interconnects , 2010, 2010 International Conference on High Performance Computing.

[12]  José E. Moreira,et al.  Topology Mapping for Blue Gene/L Supercomputer , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[13]  José Antonio Lozano,et al.  A fast implementation of the first fit contiguous partitioning strategy for cubic topologies , 2014, Concurr. Comput. Pract. Exp..

[14]  Brian E. Smith,et al.  Performance Effects of Node Mappings on the IBM BlueGene/L Machine , 2005, Euro-Par.

[15]  David P. Bunde,et al.  Local search to improve task mapping. , 2014 .

[16]  Hee Yong Youn,et al.  Isomorphic Strategy for Processor Allocation in k-Ary n-Cube Systems , 2003, IEEE Trans. Computers.

[17]  Cynthia A. Phillips,et al.  Communication-Aware Processor Allocation for Supercomputers: Finding Point Sets of Small Average Distance , 2007, Algorithmica.

[18]  David P. Bunde,et al.  Faster high-quality processor allocation. , 2010 .

[19]  Cruz Izu,et al.  The Adaptive Bubble Router , 2001, J. Parallel Distributed Comput..

[20]  Bill Nitzberg,et al.  Noncontiguous Processor Allocation Algorithms for Mesh-Connected Multicomputers , 1997, IEEE Trans. Parallel Distributed Syst..

[21]  Javier Navaridas,et al.  Realistic Evaluation of Interconnection Networks Using Synthetic Traffic , 2009, 2009 Eighth International Symposium on Parallel and Distributed Computing.

[22]  V. Lo,et al.  Contiguous and Non-contiguous Processor Allocation , 1995 .

[23]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[24]  Bill Nitzberg,et al.  Non-contiguous processor allocation algorithms for distributed memory multicomputers , 1994, Proceedings of Supercomputing '94.

[25]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[26]  Dan Tsafrir,et al.  Backfilling Using System-Generated Predictions Rather than User Runtime Estimates , 2007, IEEE Transactions on Parallel and Distributed Systems.

[27]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.