A New Approach To Load Balance for Parallel Compositional Simulation Based on Reservoir Model Over-Decomposition

The quest for efficient and scalable parallel reservoir simulators has been evolving with the advancement of high performance computing architectures. Among the various challenges of efficiency and scalability, load imbalance is a major obstacle that has not been fully addressed and solved. The reasons that cause load imbalance in parallel reservoir simulation are both static and dynamic. Robust graph partitioning algorithms are capable of handling static load imbalance by decomposing the underlying reservoir geometry to distribute a roughly equal load to each processor. However, these loads determined by a static load balancer seldom remain unchanged as the simulation proceeds in time. This so called dynamic imbalance can be further exacerbated in parallel compositional simulations. The flash calculations for equations of state in complex compositional simulations not only can consume over half of the total execution time but also are difficult to balance merely by a static load balancer. The computational cost of flash calculations in each grid block heavily depends on the dynamic data such as pressure, temperature, and hydrocarbon composition. Thus, any static assignment of grid blocks may lead to dynamic load imbalance in unpredictable manners. A dynamic load balancer can often provide solutions for this difficulty. However, traditional techniques are inflexible and tedious to implement in legacy reservoir simulators. In this paper, we present a new approach to address dynamic load imbalance in parallel compositional simulation. It overdecomposes the reservoir model to assign each processor a bundle of subdomains. Processors treat these bundles of subdomains as virtual processes or user-level migratable threads which can be dynamically migrated across processors in the run-time system. This technique is shown to be capable of achieving better overlap between computation and communication for cache efficiency. We employ this approach in a legacy reservoir simulator and demonstrate reduction in the execution time of parallel compositional simulations while requiring minimal changes to the source code. Finally, it is shown that domain over-decomposition together with a load balancer can improve speedup from 29.27 to 62.38 on 64 physical processors for a realistic simulation problem.

[1]  Laxmikant V. Kalé,et al.  Fine-grained parallelization of the Car - Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer , 2008, IBM J. Res. Dev..

[2]  Serguei Maliassov,et al.  Partitioners for Parallelizing Reservoir Simulations , 2009 .

[3]  Jeremy Appleyard,et al.  Accelerating Reservoir Simulators using GPU Technology , 2011, ANSS 2011.

[4]  R. Fiedler,et al.  An Integration Framework for Simulations of Solid Rocket Motors , 2005 .

[5]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[6]  Hamdi A. Tchelepi,et al.  Parallel Scalable Unstructured CPR-Type Linear Solver for Reservoir Simulation , 2005 .

[7]  Laxmikant V. Kalé,et al.  Multiple flows of control in migratable parallel programs , 2006, 2006 International Conference on Parallel Processing Workshops (ICPPW'06).

[8]  Laxmikant V. Kalé,et al.  Automatic Handling of Global Variables for Multi-threaded MPI Programs , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[9]  Erik Elmroth,et al.  Parallel Computing Techniques for Large-Scale Reservoir Simulation of Multi- Component and Multiphase Fluid Flow , 2001 .

[10]  Peyman P. Moghaddam,et al.  Industrial-Scale Reverse Time Migration On GPU Hardware , 2009 .

[11]  Laxmikant V. Kalé,et al.  A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model , 2010, 2010 22nd International Symposium on Computer Architecture and High Performance Computing.

[12]  Laxmikant V. Kale,et al.  Programming Petascale Applications with Charm , 2007 .

[13]  Laxmikant V. Kalé,et al.  Massively parallel cosmological simulations with ChaNGa , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[14]  Yousef Saad,et al.  Exploiting Capabilities of Many Core Platforms in Reservoir Simulation , 2011, ANSS 2011.

[15]  John Killough,et al.  An Experimental Study of GPU Acceleration for Reservoir Simulation , 2013, ANSS 2013.

[16]  Philippe Olivier Alexandre Navaux,et al.  A new technique for data privatization in user-level threads and its use in parallel applications , 2010, SAC '10.

[17]  A. H. Sherman A Hybrid Approach to Parallel Compositions Reservoir Simulation , 1992 .

[18]  Zhangxin John Chen,et al.  Parallel Preconditioners for Reservoir Simulation on GPU , 2012 .

[19]  Simulation of Naturally Fractured Reservoirs with SimBestII , 2005 .

[20]  Laxmikant V. Kalé,et al.  Performance evaluation of adaptive MPI , 2006, PPoPP '06.

[21]  Mary F. Wheeler,et al.  Parallel Iterative Linear Equation Solvers: An Investigation of Domain Decomposition Algorithms for Reservoir Simulation , 1987 .

[22]  John Killough,et al.  Static and Dynamic Load-Balancing Strategies for Parallel Reservoir Simulation , 1995 .

[23]  Laxmikant V. Kalé,et al.  Adaptive MPI , 2003, LCPC.

[24]  Gengbin Zheng,et al.  Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing , 2005 .

[25]  Larry S.K. Fung,et al.  Parallel Unstructured Solver Methods for Complex Giant Reservoir Simulation , 2007 .