Dynamic load balancing for petascale quantum Monte Carlo applications: The Alias method

Abstract Diffusion Monte Carlo is a highly accurate Quantum Monte Carlo method for electronic structure calculations of materials, but it requires frequent load balancing or population redistribution steps to maintain efficiency on parallel machines. This step can be a significant factor affecting performance, and will become more important as the number of processing elements increases. We propose a new dynamic load balancing algorithm, the Alias Method, and evaluate it theoretically and empirically. An important feature of the new algorithm is that the load can be perfectly balanced with each process receiving at most one message. It is also optimal in the maximum size of messages received by any process. We also optimize its implementation to reduce network contention, a process facilitated by the low messaging requirement of the algorithm: a simple renumbering of the MPI ranks based on proximity and a space filling curve significantly improves the MPI Allgather performance. Empirical results on the petaflop Cray XT Jaguar supercomputer at ORNL show up to 30% improvement in performance on 120,000 cores. The load balancing algorithm may be straightforwardly implemented in existing codes. The algorithm may also be employed by any method with many near identical computational tasks that require load balancing.

[1]  Leonid Oliker,et al.  PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes , 1998, J. Parallel Distributed Comput..

[2]  S. Muthukrishnan,et al.  First and second order diffusive methods for rapid, coarse, distributed load balancing (extended abstract) , 1996, SPAA '96.

[3]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[4]  R. Martin,et al.  Electronic Structure: Basic Theory and Practical Methods , 2004 .

[5]  Alastair J. Walker,et al.  An Efficient Method for Generating Discrete Random Variables with General Distributions , 1977, TOMS.

[6]  Vipin Kumar,et al.  A Unified Algorithm for Load-balancing Adaptive Scientific Simulations , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[7]  Horst D. Simon,et al.  Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems , 1994, Concurr. Pract. Exp..

[8]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[9]  R. Needs,et al.  Quantum Monte Carlo simulations of solids , 2001 .

[10]  Xiaolin Li,et al.  Hierarchical Partitioning Techniques for Structured Adaptive Mesh Refinement Applications , 2004, The Journal of Supercomputing.

[11]  Norbert Nemec,et al.  Diffusion Monte Carlo: Exponential scaling of computational cost for large systems , 2009, 0906.0501.

[12]  D. Alfé,et al.  Petascale computing opens new vistas for quantum Monte , 2011 .

[13]  Elsevier Sdol,et al.  Journal of Parallel and Distributed Computing , 2009 .

[14]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[15]  R. Needs,et al.  Continuum variational and diffusion quantum Monte Carlo calculations , 2010, Journal of physics. Condensed matter : an Institute of Physics journal.

[16]  S. Muthukrishnan,et al.  First- and Second-Order Diffusive Methods for Rapid, Coarse, Distributed Load Balancing , 1996, Theory of Computing Systems.

[17]  Robert Elsässer,et al.  Distributing Unit Size Workload Packages in Heterogeneous Networks , 2006, J. Graph Algorithms Appl..

[18]  Kenichi Hagihara,et al.  Near-optimal dynamic task scheduling of independent coarse-grained tasks onto a computational grid , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[19]  R. Kronmal,et al.  On the Alias Method for Generating Random Variables From a Discrete Distribution , 1979 .

[20]  Peter Reynolds,et al.  Monte Carlo Methods In Ab Initio Quantum Chemistry , 1994 .

[21]  Ralf Diekmann,et al.  Efficient schemes for nearest neighbor load balancing , 1999, Parallel Comput..

[22]  Yifan Hu,et al.  An optimal migration algorithm for dynamic load balancing , 1998 .

[23]  A. Lüchow Quantum Monte Carlo methods , 2011 .

[24]  Xiao Qin,et al.  Performance comparisons of load balancing algorithms for I/O-intensive workloads on clusters , 2008, J. Netw. Comput. Appl..

[25]  Y. F. Hu,et al.  An improved diffusion algorithm for dynamic load balancing , 1999, Parallel Comput..

[26]  Vipin Kumar,et al.  Multilevel Graph Partitioning Schemes , 1995, ICPP.

[27]  Lubos Mitas,et al.  QWalk: A quantum Monte Carlo program for electronic structure , 2007, J. Comput. Phys..

[28]  Chris Walshaw,et al.  Dynamic mesh partitioning and load-balancing for parallel computational mechanics codes , 2002 .

[29]  R. F. Freund,et al.  Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing Systems , 1999, J. Parallel Distributed Comput..

[30]  George Cybenko,et al.  Dynamic Load Balancing for Distributed Memory Multiprocessors , 1989, J. Parallel Distributed Comput..

[31]  Alexandru Iosup,et al.  The performance of bags-of-tasks in large-scale distributed systems , 2008, HPDC '08.

[32]  K. Fiedler,et al.  Monte Carlo Methods in Ab Initio Quantum Chemistry , 1995 .

[33]  David A. Bader,et al.  Dynamic Load Balancing in Distributed Systems in the Presence of Delays: A Regeneration-Theory Approach , 2007, IEEE Transactions on Parallel and Distributed Systems.

[34]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[35]  R. F. Freund,et al.  Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[36]  Rupak Biswas,et al.  Parallel Load Balancing for Adaptive Unstructured Meshes , 1998 .

[37]  Ümit V. Çatalyürek,et al.  A repartitioning hypergraph model for dynamic load balancing , 2009, J. Parallel Distributed Comput..