Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures

We present a fast, petaflop-scalable algorithm for Stokesian particulate flows. Our goal is the direct simulation of blood, which we model as a mixture of a Stokesian fluid (plasma) and red blood cells (RBCs). Directly simulating blood is a challenging multiscale, multiphysics problem. We report simulations with up to 200 million deformable RBCs. The largest simulation amounts to 90 billion unknowns in space. In terms of the number of cells, we improve the state-of-the art by several orders of magnitude: the previous largest simulation, at the same physical fidelity as ours, resolved the flow of O(1,000-10,000) RBCs. Our approach has three distinct characteristics: (1) we faithfully represent the physics of RBCs by using nonlinear solid mechanics to capture the deformations of each cell; (2) we accurately resolve the long-range, N-body, hydrodynamic interactions between RBCs (which are caused by the surrounding plasma); and (3) we allow for the highly non-uniform distribution of RBCs in space. The new method has been implemented in the software library MOBO (for “Moving Boundaries”). We designed MOBO to support parallelism at all levels, including inter-node distributed memory parallelism, intra-node shared memory parallelism, data parallelism (vectorization), and fine-grained multithreading for GPUs. We have implemented and optimized the majority of the computation kernels on both Intel/AMD x86 and NVidia's Tesla/Fermi platforms for single and double floating point precision. Overall, the code has scaled on 256 CPU-GPUs on the Teragrid's Lincoln cluster and on 200,000 AMD cores of the Oak Ridge national Laboratory's Jaguar PF system. In our largest simulation, we have achieved 0.7 Petaflops/s of sustained performance on Jaguar.

[1]  Katherine A. Yelick,et al.  Distributed Immersed Boundary Simulation in Titanium , 2006, SIAM J. Sci. Comput..

[2]  D. Zorin,et al.  A fast solver for the Stokes equations with distributed forces in complex geometries , 2004 .

[3]  Lucy T. Zhang,et al.  A Parallelized Meshfree Method with Boundary Enrichment for Large-Scale CFD , 2002 .

[4]  Hari Sundar,et al.  Dendro: parallel algorithms for multigrid and AMR methods on 2:1 balanced octrees , 2008, HiPC 2008.

[5]  Guy E. Blelloch,et al.  A PARALLEL DYNAMIC-MESH LAGRANGIAN METHOD FOR SIMULATION OF FLOWS WITH DYNAMIC INTERFACES , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[6]  Alexander Z. Zinchenko,et al.  Large–scale simulations of concentrated emulsion flows , 2003, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[7]  R. Glowinski,et al.  A fictitious domain approach to the direct numerical simulation of incompressible viscous flow past moving rigid bodies: application to particulate flow , 2001 .

[8]  Witold Dzwinel,et al.  A discrete-particle model of blood dynamics in capillary vessels. , 2003, Journal of colloid and interface science.

[9]  John J. R. Williams,et al.  A parallel fictitious domain multigrid preconditioner for the solution of Poisson's equation in complex geometries , 2005 .

[10]  Patrick H. Worley,et al.  Algorithm 888: Spherical Harmonic Transform Algorithms , 2008, TOMS.

[11]  Christos Davatzikos,et al.  Low-constant parallel algorithms for finite element simulations using linear octrees , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[12]  G. Biros,et al.  Why do red blood cells have asymmetric shapes even in a symmetric flow? , 2009, Physical review letters.

[13]  C. Pozrikidis,et al.  Interfacial dynamics for Stokes flow , 2001 .

[14]  Lexing Ying,et al.  A New Parallel Kernel-Independent Fast Multipole Method , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[15]  G. Karniadakis,et al.  Blood flow velocity effects and role of activation delay time on growth and form of platelet thrombi , 2006, Proceedings of the National Academy of Sciences.

[16]  Samuel Williams,et al.  Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[17]  Ka Yan Lee,et al.  Large Scale Simulation of Suspensions with PVM , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[18]  Gretar Tryggvason,et al.  An Adaptive, Cartesian, Front-Tracking Method for the Motion, Deformation and Adhesion of Circulating Cells , 1998 .

[19]  Lexing Ying,et al.  A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, SC.

[20]  Yaling Liu,et al.  Rheology of red blood cell aggregation by computer simulation , 2006, J. Comput. Phys..

[21]  Richard W. Vuduc,et al.  A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[22]  Aaron L Fogelson,et al.  Platelet-wall interactions in continuum models of platelet thrombosis: formulation and numerical solution. , 2004, Mathematical medicine and biology : a journal of the IMA.

[23]  Gretar Tryggvason,et al.  Computations of Multiphase Flows , 2003 .

[24]  George Biros,et al.  Author ' s personal copy Dynamic simulation of locally inextensible vesicles suspended in an arbitrary two-dimensional domain , a boundary integral method , 2010 .

[25]  Tayfun E. Tezduyar,et al.  Parallel finite element computations in fluid mechanics , 2006 .

[26]  L. E. Becker,et al.  Instability of elastic filaments in shear flow yields first-normal-stress differences. , 2001, Physical review letters.

[27]  Hong Zhao,et al.  A spectral boundary integral method for flowing blood cells , 2010, J. Comput. Phys..

[28]  George Biros,et al.  A numerical method for simulating the dynamics of 3D axisymmetric vesicles suspended in viscous flows , 2009, J. Comput. Phys..

[29]  Theo G. Theofanous,et al.  The lattice Boltzmann equation method: theoretical interpretation, numerics and implications , 2003 .

[30]  C. Pozrikidis,et al.  Numerical Simulation of the Flow-Induced Deformation of Red Blood Cells , 2003, Annals of Biomedical Engineering.

[31]  Cyrus K. Aidun,et al.  Parallel performance of a lattice-Boltzmann/finite element cellular blood flow solver on the IBM Blue Gene/P architecture , 2010, Comput. Phys. Commun..

[32]  Zydrunas Gimbutas,et al.  A fast and stable method for rotating spherical harmonic expansions , 2009, J. Comput. Phys..

[33]  M R King,et al.  Multiparticle adhesive dynamics: Hydrodynamic recruitment of rolling leukocytes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Charles S. Peskin,et al.  Shared-Memory Parallel Vector Implementation of the Immersed Boundary Method for the Computation of Blood Flow in the Beating Mammalian Heart , 2004, The Journal of Supercomputing.

[35]  Robert H. Davis,et al.  An Efficient Algorithm for Hydrodynamical Interaction of Many Deformable Drops , 2000 .

[36]  L. Munn,et al.  Particulate nature of blood determines macroscopic rheology: a 2-D lattice Boltzmann analysis. , 2005, Biophysical journal.

[37]  George Biros,et al.  A fast algorithm for simulating vesicle flows in three dimensions , 2011, J. Comput. Phys..

[38]  Prosenjit Bagchi,et al.  Mesoscale simulation of blood flow in small vessels. , 2007, Biophysical journal.