论文信息 - A Multi-faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems

A Multi-faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems

Job placement plays a pivotal role in application performance on supercomputers. We present a multi-faceted exploration to influence placement in extreme-scale systems, to improve network performance and decrease variability. In our first exploration, Scores, we developed a machine learning model that extracts features from a job's node-allocation and grades performance. This identified several important node-metrics that led to Dual-Ended scheduling, a means of reducing network contention without impacting utilization. In evaluations on the Titan supercomputer, we observed reductions in average hop-count by up to 50%. We also developed an improved node-layout strategy that targets a better balance between network latency and bandwidth, replacing the default ALPS layout on Titan that resulted in an average of 10% runtime improvement. Both of these efforts underscore the importance of a job placement strategy that is cognizant of workload mixture and network topology.

Saurabh Gupta | Scott Atchley | Christopher Zimmer | Carl Albing | Sudharshan S. Vazhkudai

[1] Laxmikant V. Kalé,et al. Application-specific topology-aware mapping for three dimensional topologies , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[2] Bill Nitzberg,et al. Noncontiguous Processor Allocation Algorithms for Mesh-Connected Multicomputers , 1997, IEEE Trans. Parallel Distributed Syst..

[3] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[4] Larry Kaplan,et al. The Gemini System Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[5] Laxmikant V. Kalé,et al. Topology-aware task mapping for reducing communication contention on large parallel machines , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[6] I. Lee,et al. Characterizing communication patterns of NAS-MPI benchmark programs , 2009, IEEE Southeastcon 2009.

[7] Dave Semeraro,et al. TorusVis : A Topology Data Visualization Tool , 2014 .

[8] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[9] Hugo Mills,et al. Scalable Node Allocation for Improved Performance in Regular and Anisotropic 3D Torus Supercomputers , 2011, EuroMPI.

[10] Carl Albing. Characterizing node orderings for improved performance , 2015, PMBS '15.

[11] Esther M. Arkin,et al. Processor allocation on Cplant: achieving general processor locality using one-dimensional allocation strategies , 2002 .

[12] Vitus J. Leung,et al. PaCMap: Topology Mapping of Unstructured Communication Patterns onto Non-contiguous Allocations , 2015, ICS.

[13] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[14] José E. Moreira,et al. Topology Mapping for Blue Gene/L Supercomputer , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[15] José E. Moreira,et al. Job Scheduling for the BlueGene/L System , 2002, JSSPP.

[16] Cynthia A. Phillips,et al. Communication-Aware Processor Allocation for Supercomputers: Finding Point Sets of Small Average Distance , 2007, Algorithmica.

[17] José E. Moreira,et al. Job Scheduling for the BlueGene/L System (Research Note) , 2002, Euro-Par.

[18] Eduardo F. D'Azevedo,et al. Developing MiniApps on Modern Platforms Using Multiple Programming Models , 2015, 2015 IEEE International Conference on Cluster Computing.

[19] José E. Moreira,et al. Resource allocation and utilization in the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..

[20] J. Enos,et al. Topology-Aware Job Scheduling Strategies for Torus Networks , 2014 .