JSweep: A Patch-centric Data-driven Approach for Parallel Sweeps on Large-scale Meshes

In mesh-based numerical simulations, sweep is an important computation pattern. During sweeping a mesh, computations on cells are strictly ordered by data dependencies in given directions. Due to such a serial order, parallelizing sweep is challenging, especially for unstructured and deforming structured meshes. Meanwhile, recent high-fidelity multi-physics simulations of particle transport, including nuclear reactor and inertial confinement fusion, require {\em sweeps} on large scale meshes with billions of cells and hundreds of directions. In this paper, we present JSweep, a parallel data-driven computational framework integrated in the JAxMIN infrastructure. The essential of JSweep is a general patch-centric data-driven abstraction, coupled with a high performance runtime system leveraging hybrid parallelism of MPI+threads and achieving dynamic communication on contemporary multi-core clusters. Built on JSweep, we implement a representative data-driven algorithm, Sn transport, featuring optimizations of vertex clustering, multi-level priority strategy and patch-angle parallelism. Experimental evaluation with two real-world applications on structured and unstructured meshes respectively, demonstrates that JSweep can scale to tens of thousands of processor cores with reasonable parallel efficiency.

[1]  Kevin T. Clarno,et al.  Denovo: A New Three-Dimensional Parallel Discrete Ordinates Code in SCALE , 2010 .

[2]  F. Xavier Trias,et al.  Parallel algorithms for Sn transport sweeps on unstructured meshes , 2013, J. Comput. Phys..

[3]  R. Baker,et al.  An Sn algorithm for the massively parallel CM-200 computer , 1998 .

[4]  Lawrence Rauchwerger,et al.  Efficient massively parallel transport sweeps , 2012 .

[5]  Xiaolin Cao,et al.  JASMIN: a parallel software infrastructure for scientific computing , 2010, Frontiers of Computer Science in China.

[6]  Shawn D. Pautz,et al.  An Algorithm for Parallel Sn Sweeps on Unstructured Meshes , 2001 .

[7]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[8]  Darren J. Kerbyson,et al.  A General Performance Model of Structured and Unstructured Mesh Particle Transport Computations , 2005, The Journal of Supercomputing.

[9]  Xiaolin Cao,et al.  Towards a parallel framework of grid-based numerical algorithms on DAGs , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[10]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[11]  R. Bowers,et al.  Numerical Modeling in Applied Physics and Astrophysics , 1991 .

[12]  Jie Yan,et al.  Graphine: Programming Graph-Parallel Computation of Large Natural Graphs for Multicore Clusters , 2016, IEEE Transactions on Parallel and Distributed Systems.

[13]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[14]  Cetin Unal,et al.  Science Based Nuclear Energy Systems Enabled by Advanced Modeling and Simulation at the Extreme Scale White Paper on Integrated Performance and Safety Codes , 2009 .

[15]  Pierre Ramet,et al.  3D Cartesian Transport Sweep for Massively Parallel Architectures with PaRSEC , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[16]  Tara M. Pandya,et al.  Massively Parallel, Three-Dimensional Transport Solutions for the k-Eigenvalue Problem , 2014 .

[17]  Carl Hewitt,et al.  A Universal Modular ACTOR Formalism for Artificial Intelligence , 1973, IJCAI.

[18]  Aiqing Zhang,et al.  A new parallel algorithm for vertex priorities of data flow acyclic digraphs , 2013, The Journal of Supercomputing.

[19]  Shawn D. Pautz,et al.  Parallel Deterministic Transport Sweeps of Structured and Unstructured Meshes with Overloaded Mesh Decompositions , 2017 .

[20]  Javier Ortensi,et al.  Physics-based multiscale coupling for full core nuclear reactor simulation , 2014 .

[21]  Aiqing Zhang,et al.  A Programming Framework for Large Scale Numerical Simulations on Unstructured Mesh , 2016, 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS).

[22]  Steven J. Plimpton,et al.  Parallel Algorithms for Radiation Transport on Unstructured Grids , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[23]  Mark S. Shephard,et al.  PUMI: Parallel Unstructured Mesh Infrastructure , 2016, ACM Trans. Math. Softw..

[24]  Vipin Kumar,et al.  Multilevel Graph Partitioning Schemes , 1995, ICPP.

[25]  Brian van Straalen,et al.  A survey of high level frameworks in block-structured adaptive mesh refinement packages , 2014, J. Parallel Distributed Comput..

[26]  Thomas Hérault,et al.  DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[27]  Jayadev Misra,et al.  Detecting termination of distributed computations using markers , 1983, PODC '83.

[28]  R. L. Childs,et al.  TORT: A Three-Dimensional Discrete Ordinates Neutron/Photon Transport Code , 1987 .

[29]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.