Highly scalable SFC-based dynamic load balancing and its application to atmospheric modeling

Abstract Load balance is one of the major challenges for efficient supercomputing, especially for applications that exhibit workload variations. Various dynamic load balancing and workload partitioning methods have been developed to handle this issue by migrating workload between nodes periodically during the runtime. However, on today’s top HPC systems–and even more so on future exascale systems–runtime performance and scalability of these methods becomes a concern, due to the costs exceeding the benefits of dynamic load balancing. In this work, we focus on methods based on space-filling curves (SFC), a well-established and comparably fast approach for workload partitioning. SFCs reduce the partitioning problem from n dimensions to one dimension. The remaining task, the so-called 1D partitioning problem or chains-on-chains partitioning problem, is to decompose a 1D workload array into consecutive, balanced partitions. While published parallel heuristics for this problem cannot reliably deliver the required workload balance, especially at large scale, exact algorithms are infeasible due to their sequential nature. We therefore propose a hierarchical method that combines a heuristic and an exact algorithm and allows to trade-off between these two approaches. We compare load balance, execution time, application communication, and task migration of the algorithms using real-life workload data from two different applications on two different HPC systems. The hierarchical method provides a significant speed-up compared to exact algorithms and yet achieves nearly the optimal load balance. On a Blue Gene/Q system, it is able to partition 2.6 million tasks for 524 288 processes with over 99% of the optimal balance in 23.4 ms only, while a published fast exact algorithm requires 6.4 s. We also provide a comparison to parallel load balancing methods implemented in the Zoltan library and present results from applying our methods to COSMO-SPECS+FD4, a detailed atmospheric simulation model that requires frequent dynamic load balancing to run efficiently at large scale.

[1]  Wolfgang E. Nagel,et al.  Scalability Tuning of the Load Balancing and Coupling Framework FD 4 , 2013 .

[2]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[3]  D Neely,et al.  Electron bunch length measurements from laser-accelerated electrons using single-shot THz time-domain interferometry. , 2010, Physical review letters.

[4]  Serge Miguet,et al.  Heuristics for 1D Rectilinear Partitioning as a Low Cost and High Quality Answer to Dynamic Load Balancing , 1997, HPCN Europe.

[5]  Matthias S. Müller,et al.  Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4 , 2010, PARA.

[6]  Guido Juckeland,et al.  Radiative signature of the relativistic Kelvin-Helmholtz Instability , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[7]  V. Grützun,et al.  Simulation of the influence of aerosol particle characteristics on clouds and precipitation with LM-SPECS: Model description and first results , 2008 .

[8]  H. L. Miller,et al.  Climate Change 2007: The Physical Science Basis , 2007 .

[9]  J. D. Teresco,et al.  A Comparison of Zoltan Dynamic Load Balancers for Adaptive Computation ∗ , 2022 .

[10]  Rolf Krause,et al.  A massively parallel, multi-disciplinary Barnes-Hut tree code for extreme-scale N-body simulations , 2012, Comput. Phys. Commun..

[11]  Yijie Han,et al.  Mapping a Chain Task to Chained Processors , 1992, Inf. Process. Lett..

[12]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[13]  Ralf Wolke,et al.  Optimizing the coupling in parallel air quality model systems , 2008, Environ. Model. Softw..

[14]  M. Baldauf,et al.  Operational Convective-Scale Numerical Weather Prediction with the COSMO Model: Description and Sensitivities , 2011 .

[15]  Jinchao Xu,et al.  Domain Decomposition Methods in Scientific and Engineering Computing , 1994 .

[16]  Biosequence AnalysisRichard HugheyComputer A Massively Parallel , 1993 .

[17]  J. Tinsley Oden,et al.  Problem decomposition for adaptive hp finite element methods , 1995 .

[18]  Felix Wolf,et al.  Dynamic Load Balancing for Unstructured Meshes on Space-Filling Curves , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[19]  Joseph E. Flaherty,et al.  Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation , 2004, PARA.

[20]  Karen Dragon Devine,et al.  Partitioning and Dynamic Load Balancing for the Numerical Solution of Partial Differential Equations , 2006 .

[21]  Matthias S. Müller,et al.  A framework for detailed multiphase cloud modeling on HPC systems , 2009, PARCO.

[22]  Adam Frank,et al.  Efficient parallelization for AMR MHD multiphysics calculations; implementation in AstroBEAR , 2011, J. Comput. Phys..

[23]  Cevdet Aykanat,et al.  Fast optimal load balancing algorithms for 1D partitioning , 2004, J. Parallel Distributed Comput..

[24]  Cevdet Aykanat,et al.  One-dimensional partitioning for heterogeneous systems: Theory and practice , 2008, J. Parallel Distributed Comput..

[25]  Oliver Meister,et al.  2D adaptivity for 3D problems: Parallel SPE10 reservoir simulation on dynamically adaptive prism grids , 2015, J. Comput. Sci..

[26]  Ümit V. Çatalyürek,et al.  The Zoltan and Isorropia parallel toolkits for combinatorial scientific computing: Partitioning, ordering and coloring , 2012, Sci. Program..

[27]  J. Larson Ten organising principles for coupling in multiphysics and multiscale models , 2009 .

[28]  Scott B. Baden,et al.  Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves , 1996, IEEE Trans. Parallel Distributed Syst..

[29]  Yusheng Feng,et al.  Domain Decomposition for Adaptive hp Finite Element Methods , 1994 .

[30]  Wolfgang E. Nagel,et al.  Scalable high-quality 1D partitioning , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).

[31]  Laxmikant V. Kalé,et al.  Periodic hierarchical load balancing for large supercomputers , 2011, Int. J. High Perform. Comput. Appl..

[32]  David M. Nicol,et al.  Rectilinear Partitioning of Irregular Data Parallel Computations , 1994, J. Parallel Distributed Comput..

[33]  Timothy C. Warburton,et al.  Extreme-Scale AMR , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.