论文信息 - High throughput software for direct numerical simulations of compressible two-phase flows

High throughput software for direct numerical simulations of compressible two-phase flows

We present an open source, object-oriented software for high throughput Direct Numerical Simulations of compressible, two-phase flows. The Navier-Stokes equations are discretized on uniform grids using high order finite volume methods. The software exploits recent CPU micro-architectures by explicit vectorization and adopts NUMA-aware techniques as well as data and computation reordering. We report a compressible flow solver with unprecedented fractions of peak performance: 45% of the peak for a single node (nominal performance of 840 GFLOP/s) and 30% for a cluster of 47'000 cores (nominal performance of 0.8 PFLOP/s). We suggest that the present work may serve as a performance upper bound, regarding achievable GFLOP/s, for two-phase flow solvers using adaptive mesh refinement. The software enables 3D simulations of shock-bubble interaction including, for the first time, effects of diffusion and surface tension, by efficiently employing two hundred billion computational elements.

[1] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[2] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[3] S. Osher,et al. Weighted essentially non-oscillatory schemes , 1994 .

[4] C. Leiserson,et al. Scheduling multithreaded computations by work stealing , 1999, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[5] K. Wood,et al. MHD Simulations of a Supernova-driven ISM and the Warm Ionized Medium , 2011, 1110.6527.

[6] Anshu Dubey,et al. Parallel algorithms for moving Lagrangian data on block structured Eulerian meshes , 2011, Parallel Comput..

[7] Brad Gallagher,et al. Terascale turbulence computation using the FLASH3 application framework on the IBM Blue Gene/L system , 2008, IBM J. Res. Dev..

[8] V. G. Weirs,et al. On Validating an Astrophysical Simulation Code , 2002, astro-ph/0206251.

[9] Brice Goglin,et al. Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective , 2009, IWOMP.

[10] Rong Ge,et al. High-performance, power-aware distributed computing for scientific applications , 2005, Computer.

[11] James J. Quirk,et al. On the dynamics of a shock–bubble interaction , 1994, Journal of Fluid Mechanics.

[12] John B. Bell,et al. Performance of a Block Structured, Hierarchical Adaptive MeshRefinement Code on the 64k Node IBM BlueGene/L Computer , 2005 .

[13] S. Couch,et al. ASPHERICAL SUPERNOVA SHOCK BREAKOUT AND THE OBSERVATIONS OF SUPERNOVA 2008D , 2010, 1007.3693.

[14] Luis Chacon,et al. Fully implicit particle-in-cell-algorithm , 2005 .

[15] Satoshi Matsuoka,et al. Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[16] O. Vasilyev,et al. Wavelet Methods in Computational Fluid Dynamics , 2010 .

[17] Leonid Oliker,et al. Impact of modern memory subsystems on cache optimizations for stencil computations , 2005, MSP '05.

[18] Justin Luitjens,et al. Uintah: a scalable framework for hazard analysis , 2010, TG.

[19] B. Wendroff,et al. Approximate Riemann Solvers, Godunov Schemes and Contact Discontinuities , 2001 .

[20] S. SIAMJ.,et al. A SIMPLE METHOD FOR COMPRESSIBLE MULTIFLUID FLOWS , 1999 .

[21] Gretar Tryggvason,et al. Direct numerical simulations of gas/liquid multiphase flows , 2011 .

[22] Phillip Colella,et al. An adaptive mesh refinement benchmark for modern parallel programming languages , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[23] Samuel Williams,et al. Hardware/software co-design for energy-efficient seismic modeling , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[24] Satoshi Matsuoka,et al. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[25] Mark F. Adams,et al. Chombo Software Package for AMR Applications Design Document , 2014 .

[26] Ralph E. Johnson,et al. Design Patterns: Abstraction and Reuse of Object-Oriented Design , 1993, ECOOP.

[27] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.

[28] R. Bonazza,et al. Shock-Bubble Interactions , 2011 .

[29] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[30] Samuel Williams,et al. Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[31] J. Haas,et al. Interaction of weak shock waves with cylindrical and spherical gas inhomogeneities , 1987, Journal of Fluid Mechanics.

[32] Dirk Schmidl,et al. Data and thread affinity in openmp programs , 2008, MAW '08.

[33] Satoshi Matsuoka,et al. Petaflop biofluidics simulations on a two million-core system , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[34] Qingyu Meng,et al. Using hybrid parallelism to improve memory use in the Uintah framework , 2011 .

[35] John Shalf,et al. Rethinking Hardware-Software Codesign for Exascale Systems , 2011, Computer.

[36] Luis Chacón,et al. An efficient mixed-precision, hybrid CPU-GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm , 2011, J. Comput. Phys..

[37] Nikolaus A. Adams,et al. A conservative interface method for compressible flows , 2006, J. Comput. Phys..

[38] P. Woodward,et al. The Piecewise Parabolic Method (PPM) for Gas Dynamical Simulations , 1984 .

[39] J. Williamson. Low-storage Runge-Kutta schemes , 1980 .

[40] W. K. Anderson,et al. Comparison of Finite Volume Flux Vector Splittings for the Euler Equations , 1985 .

[41] M. Berger,et al. Adaptive mesh refinement for hyperbolic partial differential equations , 1982 .

[42] Diego Rossinelli,et al. Multicore/Multi-GPU Accelerated Simulations of Multiphase Compressible Flows Using Wavelet Adapted Grids , 2011, SIAM J. Sci. Comput..

[43] B. Fryxell,et al. FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[44] Eric Johnsen,et al. Implementation of WENO schemes in compressible multicomponent flow problems , 2005, J. Comput. Phys..

[45] M. Norman,et al. The three-dimensional interaction of a supernova remnant with an interstellar cloud , 1992 .

[46] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.

[47] Diego Rossinelli,et al. High order finite volume methods on wavelet-adapted grids with local time-stepping on multicore architectures for the simulation of shock-bubble interactions , 2010, J. Comput. Phys..

[48] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[49] Firas Hamze,et al. Importance of explicit vectorization for CPU and GPU software performance , 2010, J. Comput. Phys..

[50] Michael Voss,et al. Optimization via Reflection on Work Stealing in TBB , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[51] J. P. Boris,et al. Vorticity generation by shock propagation through bubbles in a gas , 1988, Journal of Fluid Mechanics.

[52] Diego Rossinelli,et al. Mesh–particle interpolations on graphics processing units and multicore central processing units , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[53] Samuel Williams,et al. Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[54] E. Toro,et al. Restoration of the contact surface in the HLL-Riemann solver , 1994 .

[55] Ryan Newton,et al. A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops , 2011, IEEE Software.

[56] F. Hussain,et al. Compressible vortex reconnection , 1995, Journal of Fluid Mechanics.

[57] Omer Savas,et al. Experimental study of the instability of unequal-strength counter-rotating vortex pairs , 2003, Journal of Fluid Mechanics.

[58] R. Bonazza,et al. Experimental and numerical investigation of shock-induced distortion of a spherical gas inhomogeneity , 2008 .

[59] P. Mulet,et al. A flux-split algorithm applied to conservative models for multicomponent compressible flows , 2003 .

[60] Diego Rossinelli,et al. GPU and APU computations of Finite Time Lyapunov Exponent fields , 2012, J. Comput. Phys..

[61] J. Freund,et al. An interface capturing method for the simulation of multi-phase compressible flows , 2010, J. Comput. Phys..

[62] Timothy C. Warburton,et al. Extreme-Scale AMR , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[63] Nikolaus A. Adams,et al. A multi-phase SPH method for macroscopic and mesoscopic flows , 2006, J. Comput. Phys..

[64] Grégoire Winckelmans,et al. Contributions to vortex particle methods for the computation of three-dimensional incompressible unsteady flows , 1993 .

[65] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[66] Nikolaus A. Adams,et al. Anti-diffusion interface sharpening technique for two-phase compressible flow simulations , 2012, J. Comput. Phys..

[67] Chi-Wang Shu,et al. Efficient Implementation of Weighted ENO Schemes , 1995 .

[68] Brice Goglin,et al. Enabling high-performance memory migration for multithreaded applications on LINUX , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.