A domain-specific language and scalable middleware for particle-mesh simulations on heterogeneous parallel computers

Alongside theory and experiment, computing has become the third pillar of science. Meeting the increasing demand for computing power, highperformance computer systems are becoming larger and more complex. At the same time, the usability and programmability of these systems has to be maintained for a growing community of scientists that use computational tools. In computational science, hybrid particle-mesh methods provide a versatile framework for simulating both discrete and continuous models either deterministically or stochastically. The parallel particle mesh (PPM) library is a software middleware providing a transparent interface for particle-mesh methods on distributed-memory computers. This thesis presents the design and implementation of algorithms, data structures, and software systems that simplify the development of efficient parallel adaptive-resolution particle-mesh simulations on heterogeneous hardware platforms. We propose a new domain-specific language for parallel hybrid particle-mesh methods, the parallel particle mesh language (PPML). This language provides abstract types, operators, and iterators for particle-mesh methods, using the PPM library as a runtime system. We also present a graphical programming environment, called webCG, that allows rapid visual prototyping of PPML programs from any web browser. These developments are accompanied by several extensions to the PPM library itself. We redesign the PPM library core following an object-oriented paradigm. This allows directly representing abstract types and operators in PPM, which greatly simplifies the runtime support for PPML. A number of extensions address the use of PPM on heterogeneous multiand manycore platforms, and for adaptive-resolution particle simulations. This first includes a Fortran 2003 POSIX threads wrapper library, extending PPM to hybrid multi-processing/multi-threading environments. Second, we present a generic algorithm for 2D and 3D particle-mesh interpolation on streaming multi-processors, and a portable OpenCL implementation thereof. We benchmark this implementation on different general-purpose GPUs and compare its performance with that of sequential and OpenMP-parallel versions. This extends the PPM library to transparently support GPU acceleration. Third, we present a new communication scheduler based on graph vertex-coloring. We assess the asymptotic runtime and perform bench-

[1]  Ivo F. Sbalzarini,et al.  A Pthreads Wrapper for Fortran 2003 , 2014, ACM Trans. Math. Softw..

[2]  Edsger W. Dijkstra,et al.  Solution of a problem in concurrent programming control , 1965, CACM.

[3]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[4]  Helgi Adalsteinsson,et al.  Design patterns for multiphysics modeling in Fortran 2003 and C++ , 2010, TOMS.

[5]  Stephen K. Scott,et al.  Autocatalytic reactions in the isothermal, continuous stirred tank reactor: Oscillations and instabilities in the system A + 2B → 3B; B → C , 1984 .

[6]  Pablo G. Debenedetti,et al.  On the use of the Verlet neighbor list in molecular dynamics , 1990 .

[7]  P. Koumoutsakos MULTISCALE FLOW SIMULATIONS USING PARTICLES , 2005 .

[8]  Rüdiger Westermann,et al.  Efficient High-Quality Volume Rendering of SPH Data , 2010, IEEE Transactions on Visualization and Computer Graphics.

[9]  Diego Rossinelli,et al.  Mesh–particle interpolations on graphics processing units and multicore central processing units , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[10]  Boleslaw K. Szymanski,et al.  How to support inheritance and run-time polymorphism in Fortran 90 , 1998 .

[11]  Guido Juckeland,et al.  A Generic Approach for Developing Highly Scalable Particle-Mesh Codes for GPUs , 2011 .

[12]  D. Geer,et al.  Chip makers turn to multicore processors , 2005, Computer.

[13]  Ivo F. Sbalzarini,et al.  Abstractions and Middleware for Petascale Computing and Beyond , 2010, Int. J. Distributed Syst. Technol..

[14]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[15]  Adrian Kosowski,et al.  Classical Coloring of Graphs , 2008 .

[16]  Ami Marowka Parallel computing on any desktop , 2007, CACM.

[17]  A. M. Turing,et al.  The chemical basis of morphogenesis , 1952, Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences.

[18]  Diego Rossinelli,et al.  Vortex methods for incompressible flow simulations on the GPU , 2008, The Visual Computer.

[19]  Ivo F. Sbalzarini,et al.  Intrinsically Disordered Regions May Lower the Hydration Free Energy in Proteins: A Case Study of Nudix Hydrolase in the Bacterium Deinococcus radiodurans , 2010, PLoS Comput. Biol..

[20]  David K. McAllister,et al.  Fast matrix multiplies using graphics hardware , 2001, SC.

[21]  L. Verlet Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules , 1967 .

[22]  Petros Koumoutsakos,et al.  A stochastic boundary forcing for dissipative particle dynamics , 2007, J. Comput. Phys..

[23]  Michael Lindenbaum,et al.  On the metric properties of discrete space-filling curves , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 2 - Conference B: Computer Vision & Image Processing. (Cat. No.94CH3440-5).

[24]  Samuel Williams,et al.  Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms , 2011, Parallel Comput..

[25]  Chandra Krintz,et al.  Neptune: a domain specific language for deploying hpc software on cloud platforms , 2011, ScienceCloud '11.

[26]  J. Monaghan,et al.  Smoothed particle hydrodynamics: Theory and application to non-spherical stars , 1977 .

[27]  Petros Koumoutsakos,et al.  Vortex Methods: Theory and Practice , 2000 .

[28]  Nikolaus A. Adams,et al.  A generalized wall boundary condition for smoothed particle hydrodynamics , 2012, J. Comput. Phys..

[29]  Nikolaus A. Adams,et al.  A new surface-tension formulation for multi-phase SPH using a reproducing divergence approximation , 2010, J. Comput. Phys..

[30]  Rolf Krause,et al.  A massively parallel, multi-disciplinary Barnes-Hut tree code for extreme-scale N-body simulations , 2012, Comput. Phys. Commun..

[31]  S. R. Chapple,et al.  The Parallel Utilities Library , 1994, Proceedings Scalable Parallel Libraries Conference.

[32]  Anders Logg,et al.  DOLFIN: Automated finite element computing , 2010, TOMS.

[33]  Arquimedes Canedo,et al.  Automatic parallelization of simulink applications , 2010, CGO '10.

[34]  Joel H. Saltz,et al.  Adaptive runtime support for direct simulation Monte Carlo methods on distributed memory architectures , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[35]  Tzihong Chiueh,et al.  GAMER: A GRAPHIC PROCESSING UNIT ACCELERATED ADAPTIVE-MESH-REFINEMENT CODE FOR ASTROPHYSICS , 2009, 0907.3390.

[36]  James Demmel,et al.  the Parallel Computing Landscape , 2022 .

[37]  D. Brandt,et al.  Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .

[38]  Mark de Berg,et al.  The Priority R-tree: a practically efficient and worst-case optimal R-tree , 2004, SIGMOD '04.

[39]  J. Krüger,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, ACM Trans. Graph..

[40]  Satoshi Matsuoka,et al.  Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[41]  Devang Shah,et al.  Implementing Lightweight Threads , 1992, USENIX Summer.

[42]  Janet E. Jones On the determination of molecular fields. —II. From the equation of state of a gas , 1924 .

[43]  Daniel Brélaz,et al.  New methods to color the vertices of a graph , 1979, CACM.

[44]  Ivo F. Sbalzarini,et al.  Large‐scale parallel discrete element simulations of granular flow , 2009 .

[45]  Alessandro Curioni,et al.  Billion vortex particle direct numerical simulations of aircraft wakes , 2008 .

[46]  Philippe H. Hünenberger,et al.  A fast pairlist‐construction algorithm for molecular simulations under periodic boundary conditions , 2004, J. Comput. Chem..

[47]  Laxmikant V. Kale,et al.  Charm++ and AMPI: Adaptive Runtime Strategies via Migratable Objects , 2009 .

[48]  Cristiano De Michele,et al.  Optimizing event-driven simulations , 2010, Comput. Phys. Commun..

[49]  Ivo F. Sbalzarini,et al.  Toward an Object-Oriented Core of the PPM Library , 2010 .

[50]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[51]  Damian Rouson,et al.  Object construction and destruction design patterns in Fortran 2003 , 2010, ICCS.

[52]  Laxmikant V. Kalé,et al.  Supporting Adaptivity in MPI for Dynamic Parallel Applications , 2007 .

[53]  Ivo F. Sbalzarini,et al.  PPM - A highly efficient parallel particle-mesh library for the simulation of continuum systems , 2006, J. Comput. Phys..

[54]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[55]  W. Luk,et al.  Axel: a heterogeneous cluster with FPGAs and GPUs , 2010, FPGA '10.

[56]  Ümit V. Çatalyürek,et al.  A Scalable Parallel Graph Coloring Algorithm for Distributed Memory Computers , 2005, Euro-Par.

[57]  Jesús Labarta,et al.  A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.

[58]  Jayadev Misra,et al.  A Constructive Proof of Vizing's Theorem , 1992, Inf. Process. Lett..

[59]  George S. Avrunin,et al.  Using model checking with symbolic execution to verify parallel numerical programs , 2006, ISSTA '06.

[60]  Xing Mei,et al.  Fast Hydraulic Erosion Simulation and Visualization on GPU , 2007 .

[61]  Mark van den Brand,et al.  Prototyping the Semantics of a DSL using ASF+SDF: Link to Formal Verification of DSL Models , 2011, AMMSE.

[62]  Steven J. Plimpton,et al.  Accurate and efficient methods for modeling colloidal mixtures in an explicit solvent using molecular dynamics , 2008, Comput. Phys. Commun..

[63]  Klaus Weide,et al.  Challenges of Computing with FLASH on Largest HPC Platforms , 2010 .

[64]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[65]  Michael Griebel,et al.  Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelizati , 1998 .

[66]  Renato Pajarola,et al.  Interactive SPH simulation and rendering on the GPU , 2010, SCA '10.

[67]  Andrea E. F. Clementi,et al.  On the Complexity of Computing Minimum Energy Consumption Broadcast Subgraphs , 2001, STACS.

[68]  Ivo F. Sbalzarini,et al.  A self-organizing Lagrangian particle method for adaptive-resolution advection-diffusion simulations , 2012, J. Comput. Phys..

[69]  W. C. Swope,et al.  A computer simulation method for the calculation of equilibrium constants for the formation of physi , 1981 .

[70]  Gui-Rong Liu,et al.  Improved neighbor list algorithm in molecular simulations using cell decomposition and data sorting method , 2004, Comput. Phys. Commun..

[71]  Gregory J. Chaitin,et al.  Register allocation & spilling via graph coloring , 1982, SIGPLAN '82.

[72]  Diego Rossinelli,et al.  GPU accelerated simulations of bluff body flows using vortex particle methods , 2010, J. Comput. Phys..

[73]  Diego Rossinelli,et al.  GPU and APU computations of Finite Time Lyapunov Exponent fields , 2012, J. Comput. Phys..

[74]  P. Degond,et al.  The weighted particle method for convection-diffusion equations. II. The anisotropic case , 1989 .

[75]  Petros Koumoutsakos,et al.  Simulations of (an)isotropic diffusion on curved biological surfaces. , 2006, Biophysical journal.

[76]  E. Bonomi,et al.  ASTRID: Structured finite element and finite volume programs adapted to parallel vectorcomputers , 1989 .

[77]  Michael Bergdorf,et al.  Direct numerical simulations of vortex rings at ReΓ = 7500 , 2007, Journal of Fluid Mechanics.

[78]  Jack J. Dongarra,et al.  Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[79]  P. Koumoutsakos,et al.  A Lagrangian particle method for reaction–diffusion systems on deforming surfaces , 2010, Journal of mathematical biology.

[80]  David Joyner,et al.  SAGE: system for algebra and geometry experimentation , 2005, SIGS.

[81]  P. Koumoutsakos,et al.  A Lagrangian particle level set method. , 2005 .

[82]  Xing Cai,et al.  Past and Future Perspectives on Scientific Software , 2010, Simula Research Laboratory.

[83]  Richard J. Hanson,et al.  Algorithm 821: A Fortran interface to POSIX threads , 2002, TOMS.

[84]  Lukas Arnold,et al.  Towards a petascale tree code: Scaling and efficiency of the PEPC library , 2011, J. Comput. Sci..

[85]  Nikolaus A. Adams,et al.  Multiscale modeling of particle in suspension with smoothed dissipative particle dynamics , 2012 .

[86]  Petros Koumoutsakos,et al.  Inviscid Axisymmetrization of an Elliptical Vortex , 1997 .

[87]  Anselmo Lastra,et al.  A shading language on graphics hardware: the pixelflow shading system , 1998, SIGGRAPH.

[88]  Michael Bergdorf,et al.  Multilevel Adaptive Particle Methods for Convection-Diffusion Equations , 2005, Multiscale Model. Simul..

[89]  Steve R. Kleiman,et al.  SunOS Multi-thread Architecture , 1991, USENIX Winter.

[90]  Daniel J. Quinlan,et al.  OVERTURE: An Object-Oriented Software System for Solving Partial Differential Equations in Serial and Parallel Environments , 1997, PPSC.

[91]  Michael Bergdorf,et al.  A hybrid model for three-dimensional simulations of sprouting angiogenesis. , 2008, Biophysical journal.

[92]  Guido Germano,et al.  Efficiency of linked cell algorithms , 2010, Comput. Phys. Commun..

[93]  Samuel Williams,et al.  Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[94]  Petros Koumoutsakos,et al.  Vortex Methods with Spatially Varying Cores , 2000 .

[95]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[96]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[97]  Petros Koumoutsakos,et al.  Particle Mesh Hydrodynamics for Astrophysics Simulations , 2007 .

[98]  M Mernik,et al.  When and how to develop domain-specific languages , 2005, CSUR.

[99]  Georg Hager,et al.  Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[100]  Pablo G. Debenedetti,et al.  On the performance of an automated Verlet neighbor list algorithm for large systems on a vector processor , 1991 .

[101]  Michael Unser,et al.  Variational B-Spline Level-Set: A Linear Filtering Approach for Fast Deformable Model Evolution , 2009, IEEE Transactions on Image Processing.

[102]  Ivo F. Sbalzarini,et al.  Discrete Region Competition for Unknown Numbers of Connected Regions , 2012, IEEE Transactions on Image Processing.

[103]  J. Michael Owen,et al.  Adaptive smoothed particle hydrodynamics, with application to cosmology: Methodology , 1996 .

[104]  Markus Kowarschik,et al.  An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms , 2002, Algorithms for Memory Hierarchies.

[105]  Edward G. Coffman,et al.  Scheduling File Transfers , 1985, SIAM J. Comput..

[106]  Ivo F. Sbalzarini,et al.  A portable OpenCL implementation of generic particle-mesh and mesh-particle interpolation in 2D and 3D , 2013, Parallel Comput..

[107]  Christopher Dyken,et al.  State-of-the-art in heterogeneous computing , 2010, Sci. Program..

[108]  Steve Karmesin,et al.  Array Design and Expression Evaluation in POOMA II , 1998, ISCOPE.

[109]  J. Monaghan,et al.  Extrapolating B splines for interpolation , 1985 .

[110]  E. Gallopoulos,et al.  Problem-solving Environments For Computational Science , 1997, IEEE Computational Science and Engineering.

[111]  William D. Mattson,et al.  Near-neighbor calculations using a modified cell-linked list method , 1999 .

[112]  L. Hernquist,et al.  TREESPH: A Unification of SPH with the Hierarchical Tree Method , 1989 .

[113]  Steven G. Parker,et al.  Uintah: a massively parallel problem solving environment , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[114]  Diego Rossinelli,et al.  High throughput software for direct numerical simulations of compressible two-phase flows , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[115]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[116]  A. Logg Automating the Finite Element Method , 2007, 1112.0433.

[117]  Nail A. Gumerov,et al.  Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU , 2008, J. Parallel Distributed Comput..

[118]  Jonathan S. Turner,et al.  Almost All k-Colorable Graphs are Easy to Color , 1988, J. Algorithms.

[119]  Ivo F. Sbalzarini,et al.  Discretization correction of general integral PSE Operators for particle methods , 2010, J. Comput. Phys..

[120]  V. Rokhlin,et al.  Rapid Evaluation of Potential Fields in Three Dimensions , 1988 .

[121]  James A. Sethian,et al.  Level Set Methods and Fast Marching Methods , 1999 .

[122]  Georges-Henri Cottet,et al.  A multiresolution remeshed Vortex-In-Cell algorithm using patches , 2011, J. Comput. Phys..

[123]  John W. Backus,et al.  The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM Conference , 1959, IFIP Congress.

[124]  Godehard Sutmann,et al.  Optimization of neighbor list techniques in liquid matter simulations , 2006 .

[125]  Thomas Y. Hou,et al.  Convergence of a variable blob vortex method for the Euler and Navier-Stokes equations , 1990 .

[126]  R W Hockney,et al.  Computer Simulation Using Particles , 1966 .

[127]  Jack J. Dongarra,et al.  From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..

[128]  J. Banavar,et al.  Computer Simulation of Liquids , 1988 .