High throughput software for direct numerical simulations of compressible two-phase flows

We present an open source, object-oriented software for high throughput Direct Numerical Simulations of compressible, two-phase flows. The Navier-Stokes equations are discretized on uniform grids using high order finite volume methods. The software exploits recent CPU micro-architectures by explicit vectorization and adopts NUMA-aware techniques as well as data and computation reordering. We report a compressible flow solver with unprecedented fractions of peak performance: 45% of the peak for a single node (nominal performance of 840 GFLOP/s) and 30% for a cluster of 47'000 cores (nominal performance of 0.8 PFLOP/s). We suggest that the present work may serve as a performance upper bound, regarding achievable GFLOP/s, for two-phase flow solvers using adaptive mesh refinement. The software enables 3D simulations of shock-bubble interaction including, for the first time, effects of diffusion and surface tension, by efficiently employing two hundred billion computational elements.

[1]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[2]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[3]  S. Osher,et al.  Weighted essentially non-oscillatory schemes , 1994 .

[4]  C. Leiserson,et al.  Scheduling multithreaded computations by work stealing , 1999, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[5]  K. Wood,et al.  MHD Simulations of a Supernova-driven ISM and the Warm Ionized Medium , 2011, 1110.6527.

[6]  Anshu Dubey,et al.  Parallel algorithms for moving Lagrangian data on block structured Eulerian meshes , 2011, Parallel Comput..

[7]  Brad Gallagher,et al.  Terascale turbulence computation using the FLASH3 application framework on the IBM Blue Gene/L system , 2008, IBM J. Res. Dev..

[8]  V. G. Weirs,et al.  On Validating an Astrophysical Simulation Code , 2002, astro-ph/0206251.

[9]  Brice Goglin,et al.  Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective , 2009, IWOMP.

[10]  Rong Ge,et al.  High-performance, power-aware distributed computing for scientific applications , 2005, Computer.

[11]  James J. Quirk,et al.  On the dynamics of a shock–bubble interaction , 1994, Journal of Fluid Mechanics.

[12]  John B. Bell,et al.  Performance of a Block Structured, Hierarchical Adaptive MeshRefinement Code on the 64k Node IBM BlueGene/L Computer , 2005 .

[13]  S. Couch,et al.  ASPHERICAL SUPERNOVA SHOCK BREAKOUT AND THE OBSERVATIONS OF SUPERNOVA 2008D , 2010, 1007.3693.

[14]  Luis Chacon,et al.  Fully implicit particle-in-cell-algorithm , 2005 .

[15]  Satoshi Matsuoka,et al.  Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[16]  O. Vasilyev,et al.  Wavelet Methods in Computational Fluid Dynamics , 2010 .

[17]  Leonid Oliker,et al.  Impact of modern memory subsystems on cache optimizations for stencil computations , 2005, MSP '05.

[18]  Justin Luitjens,et al.  Uintah: a scalable framework for hazard analysis , 2010, TG.

[19]  B. Wendroff,et al.  Approximate Riemann Solvers, Godunov Schemes and Contact Discontinuities , 2001 .

[20]  S. SIAMJ.,et al.  A SIMPLE METHOD FOR COMPRESSIBLE MULTIFLUID FLOWS , 1999 .

[21]  Gretar Tryggvason,et al.  Direct numerical simulations of gas/liquid multiphase flows , 2011 .

[22]  Phillip Colella,et al.  An adaptive mesh refinement benchmark for modern parallel programming languages , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[23]  Samuel Williams,et al.  Hardware/software co-design for energy-efficient seismic modeling , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[24]  Satoshi Matsuoka,et al.  Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[25]  Mark F. Adams,et al.  Chombo Software Package for AMR Applications Design Document , 2014 .

[26]  Ralph E. Johnson,et al.  Design Patterns: Abstraction and Reuse of Object-Oriented Design , 1993, ECOOP.

[27]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[28]  R. Bonazza,et al.  Shock-Bubble Interactions , 2011 .

[29]  Samuel Williams,et al.  An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[30]  Samuel Williams,et al.  Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[31]  J. Haas,et al.  Interaction of weak shock waves with cylindrical and spherical gas inhomogeneities , 1987, Journal of Fluid Mechanics.

[32]  Dirk Schmidl,et al.  Data and thread affinity in openmp programs , 2008, MAW '08.

[33]  Satoshi Matsuoka,et al.  Petaflop biofluidics simulations on a two million-core system , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[34]  Qingyu Meng,et al.  Using hybrid parallelism to improve memory use in the Uintah framework , 2011 .

[35]  John Shalf,et al.  Rethinking Hardware-Software Codesign for Exascale Systems , 2011, Computer.

[36]  Luis Chacón,et al.  An efficient mixed-precision, hybrid CPU-GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm , 2011, J. Comput. Phys..

[37]  Nikolaus A. Adams,et al.  A conservative interface method for compressible flows , 2006, J. Comput. Phys..

[38]  P. Woodward,et al.  The Piecewise Parabolic Method (PPM) for Gas Dynamical Simulations , 1984 .

[39]  J. Williamson Low-storage Runge-Kutta schemes , 1980 .

[40]  W. K. Anderson,et al.  Comparison of Finite Volume Flux Vector Splittings for the Euler Equations , 1985 .

[41]  M. Berger,et al.  Adaptive mesh refinement for hyperbolic partial differential equations , 1982 .

[42]  Diego Rossinelli,et al.  Multicore/Multi-GPU Accelerated Simulations of Multiphase Compressible Flows Using Wavelet Adapted Grids , 2011, SIAM J. Sci. Comput..

[43]  B. Fryxell,et al.  FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[44]  Eric Johnsen,et al.  Implementation of WENO schemes in compressible multicomponent flow problems , 2005, J. Comput. Phys..

[45]  M. Norman,et al.  The three-dimensional interaction of a supernova remnant with an interstellar cloud , 1992 .

[46]  Samuel Williams,et al.  Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.

[47]  Diego Rossinelli,et al.  High order finite volume methods on wavelet-adapted grids with local time-stepping on multicore architectures for the simulation of shock-bubble interactions , 2010, J. Comput. Phys..

[48]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[49]  Firas Hamze,et al.  Importance of explicit vectorization for CPU and GPU software performance , 2010, J. Comput. Phys..

[50]  Michael Voss,et al.  Optimization via Reflection on Work Stealing in TBB , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[51]  J. P. Boris,et al.  Vorticity generation by shock propagation through bubbles in a gas , 1988, Journal of Fluid Mechanics.

[52]  Diego Rossinelli,et al.  Mesh–particle interpolations on graphics processing units and multicore central processing units , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[53]  Samuel Williams,et al.  Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[54]  E. Toro,et al.  Restoration of the contact surface in the HLL-Riemann solver , 1994 .

[55]  Ryan Newton,et al.  A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops , 2011, IEEE Software.

[56]  F. Hussain,et al.  Compressible vortex reconnection , 1995, Journal of Fluid Mechanics.

[57]  Omer Savas,et al.  Experimental study of the instability of unequal-strength counter-rotating vortex pairs , 2003, Journal of Fluid Mechanics.

[58]  R. Bonazza,et al.  Experimental and numerical investigation of shock-induced distortion of a spherical gas inhomogeneity , 2008 .

[59]  P. Mulet,et al.  A flux-split algorithm applied to conservative models for multicomponent compressible flows , 2003 .

[60]  Diego Rossinelli,et al.  GPU and APU computations of Finite Time Lyapunov Exponent fields , 2012, J. Comput. Phys..

[61]  J. Freund,et al.  An interface capturing method for the simulation of multi-phase compressible flows , 2010, J. Comput. Phys..

[62]  Timothy C. Warburton,et al.  Extreme-Scale AMR , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[63]  Nikolaus A. Adams,et al.  A multi-phase SPH method for macroscopic and mesoscopic flows , 2006, J. Comput. Phys..

[64]  Grégoire Winckelmans,et al.  Contributions to vortex particle methods for the computation of three-dimensional incompressible unsteady flows , 1993 .

[65]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[66]  Nikolaus A. Adams,et al.  Anti-diffusion interface sharpening technique for two-phase compressible flow simulations , 2012, J. Comput. Phys..

[67]  Chi-Wang Shu,et al.  Efficient Implementation of Weighted ENO Schemes , 1995 .

[68]  Brice Goglin,et al.  Enabling high-performance memory migration for multithreaded applications on LINUX , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.