论文信息 - Towards Data-Flow Parallelization for Adaptive Mesh Refinement Applications

Towards Data-Flow Parallelization for Adaptive Mesh Refinement Applications

Adaptive Mesh Refinement (AMR) is a prevalent method used by distributed-memory simulation applications to adapt the accuracy of their solutions depending on the turbulent conditions in each of their domain regions. These applications are usually dynamic since their domain areas are refined or coarsened in various refinement stages during their execution. Thus, they periodically redistribute their workloads among processes to avoid load imbalance. Although the defacto standard for scientific computing in distributed environments is MPI, in recent years, pure MPI applications are being ported to hybrid ones, attempting to cope with modern multi-core systems. Recently, the Task-Aware MPI library was proposed to efficiently integrate MPI communications and tasking models, providing also the transparent management of communications issued by tasks. In this paper, we demonstrate the benefits of porting AMR applications to data-flow programming models leveraging that novel hybrid approach. We exploit most of the application parallelism by taskifying all stages, allowing their natural overlap. We employ these techniques on the miniAMR proxy application, which mimics the refinement, load balancing, communication, and computation patterns of general AMR applications. We evaluate how this approach reduces the time in its computation and communication phases while achieving better programmability than other conventional hybrid techniques.

[1] Satoshi Matsuoka,et al. Characterizing MPI and Hybrid MPI+Threads Applications at Scale: Case Study with BFS , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[2] Alejandro Rico,et al. On the Benefits of Tasking with OpenMP , 2019, IWOMP.

[3] P. Colella,et al. Local adaptive mesh refinement for shock hydrodynamics , 1989 .

[4] Rolf Rabenseifner,et al. Hybrid Parallel Programming: Performance Problems and Chances , 2003 .

[5] Jannis Klinkenberg,et al. Hybrid MPI+OpenMP Reactive Work Stealing in Distributed Memory in the PDE Framework sam(oa)^2 , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[6] Tzihong Chiueh,et al. Directionally unsplit hydrodynamic schemes with hybrid MPI/OpenMP/GPU parallelization in AMR , 2011, Int. J. High Perform. Comput. Appl..

[7] J. M. McGlaun,et al. CTH: A three-dimensional shock wave physics code , 1990 .

[8] Pedro J. Martínez-Ferrer,et al. HDOT - an Approach Towards Productive Programming of Hybrid Applications , 2019, J. Parallel Distributed Comput..

[9] Dinshaw S. Balsara,et al. Highly parallel structured adaptive mesh refinement using parallel language-based approaches , 2001, Parallel Comput..

[10] Haoqiang Jin,et al. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster , 2003 .

[11] John Shalf,et al. BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework , 2016, SIAM J. Sci. Comput..

[12] Gerhard Wellein,et al. Comparison of Parallel Programming Models on Clusters of SMP Nodes , 2003, HPSC.

[13] Mark F. Adams,et al. Chombo Software Package for AMR Applications Design Document , 2014 .

[14] John Shalf,et al. The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[15] Laurent Colombet,et al. Combining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations , 2018, ICPP.

[16] Courtenay T. Vaughan,et al. Enabling Tractable Exploration of the Performance of Adaptive Mesh Refinement , 2015, 2015 IEEE International Conference on Cluster Computing.

[17] Marcus S. Day,et al. AMReX: a framework for block-structured adaptive mesh refinement , 2019, J. Open Source Softw..

[18] John Shalf,et al. Phase Asynchronous AMR Execution for Productive and Performant Astrophysical Flows , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19] Jesús Labarta,et al. Integrating Blocking and Non-Blocking MPI Primitives with Task-Based Programming Models , 2019, Parallel Comput..