High-Performance Computing: Dos and Don’ts

Computational fluid dynamics (CFD) is the main field of computational mechanics that has historically benefited from advances in high-performance computing. High-performance computing involves several techniques to make a simulation efficient and fast, such as distributed memory parallelism, shared memory parallelism, vectorization, memory access optimizations, etc. As an introduction, we present the anatomy of supercomputers, with special emphasis on HPC aspects relevant to CFD. Then, we develop some of the HPC concepts and numerical techniques applied to the complete CFD simulation framework: from preprocess (meshing) to postprocess (visualization) through the simulation itself (assembly and iterative solvers).

[1]  Nigel P. Weatherill,et al.  Distributed parallel Delaunay mesh generation , 1999 .

[2]  Matthew G. Knepley,et al.  Efficient Mesh Management in Firedrake Using PETSc DMPlex , 2015, SIAM J. Sci. Comput..

[3]  Frédéric Magoulès,et al.  Asynchronous iterative sub-structuring methods , 2018, Math. Comput. Simul..

[4]  Guillaume Houzeaux,et al.  Deflated preconditioned conjugate gradient solvers for the pressure‐Poisson equation: Extensions and improvements , 2011 .

[5]  D. Venditti,et al.  Anisotropic grid adaptation for functional outputs: application to two-dimensional viscous flows , 2003 .

[6]  Mateo Valero,et al.  ALYA: MULTIPHYSICS ENGINEERING SIMULATION TOWARDS EXASCALE , 2014 .

[7]  Loïc Thebault,et al.  Scalable and Efficient Algorithms for Unstructured Mesh Computations , 2016 .

[8]  Jeremy S. Meredith,et al.  Parallel in situ coupling of simulation with a fully featured visualization system , 2011, EGPGV '11.

[9]  Thierry Coupez,et al.  Parallel meshing and remeshing , 2000 .

[10]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[11]  Jayadev Misra,et al.  A Constructive Proof of Vizing's Theorem , 1992, Inf. Process. Lett..

[12]  Thierry Coupez,et al.  Dynamic Parallel Adaption for Three Dimensional Unstructured Meshes: Application to Interface Tracking , 2008, IMR.

[13]  Sascha M. Schnepp,et al.  Pipelined, Flexible Krylov Subspace Methods , 2015, SIAM J. Sci. Comput..

[14]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[15]  Guillaume Houzeaux,et al.  Parallel Scientific Computing: Magoulès/Parallel Scientific Computing , 2015 .

[16]  Yvan Notay,et al.  A new algebraic multigrid approach for Stokes problems , 2016, Numerische Mathematik.

[17]  Richard W. Hamming,et al.  Numerical Methods for Scientists and Engineers , 1963 .

[18]  Charbel Farhat,et al.  A general approach to nonlinear FE computations on shared-memory multiprocessors , 1989 .

[19]  Artem Napov,et al.  A massively parallel solver for discrete Poisson-like problems , 2015, J. Comput. Phys..

[20]  Utkarsh Ayachit,et al.  ParaView Catalyst: Enabling In Situ Data Analysis and Visualization , 2015, ISAV@SC.

[21]  Andreas Lintermann,et al.  Rhinodiagnost: Morphological and Functional Precision Diagnostics of Nasal Cavities , 2017 .

[22]  Antonio Munjiza,et al.  On parallel pre‐conditioners for pressure Poisson equation in LES of complex geometry flows , 2017 .

[23]  Emmanuel Agullo,et al.  Analyzing the Effect of Local Rounding Error Propagation on the Maximal Attainable Accuracy of the Pipelined Conjugate Gradient Method , 2016, SIAM J. Matrix Anal. Appl..

[24]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[25]  Rainald Löhner,et al.  A linelet preconditioner for incompressible flow solvers , 2003 .

[26]  M. Benzi Preconditioning techniques for large linear systems: a survey , 2002 .

[27]  Jesús Labarta,et al.  Dynamic load balance applied to particle transport in fluids , 2016 .

[28]  Cédric Augonnet,et al.  StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators , 2012, EuroMPI.

[29]  R. Codina Pressure Stability in Fractional Step Finite Element Methods for Incompressible Flows , 2001 .

[30]  L. Barreira,et al.  Lyapunov Exponents and Smooth Ergodic Theory , 2002 .

[31]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[32]  Gunther H. Weber,et al.  Performance Analysis, Design Considerations, and Applications of Extreme-Scale In Situ Infrastructures , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  Guillaume Houzeaux,et al.  Extension of fractional step techniques for incompressible flows: The preconditioned Orthomin(1) for the pressure Schur complement , 2011 .

[34]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[35]  Robert Sisneros,et al.  Damaris/Viz: A nonintrusive, adaptable and user-friendly in situ visualization framework , 2013, 2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV).

[36]  Jason Duell,et al.  Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .

[37]  Bernd Hamann,et al.  A Practical Approach to Morse-Smale Complex Computation: Scalability and Generality , 2008, IEEE Transactions on Visualization and Computer Graphics.

[38]  A. Quarteroni,et al.  Factorization methods for the numerical approximation of Navier-Stokes equations , 2000 .

[39]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[40]  Jean Roman,et al.  Design and Analysis of a Task-based Parallelization over a Runtime System of an Explicit Finite-Volume CFD Code with Adaptive Time Stepping , 2017, J. Comput. Sci..

[41]  Wim Vanroose,et al.  Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..

[42]  Matthew G. Knepley,et al.  petsc: Portable, Extensible Toolkit for Scientific Computation , 2016 .

[43]  Mark S. Shephard,et al.  Parallel refinement and coarsening of tetrahedral meshes , 1999 .

[44]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[45]  Elie Hachem,et al.  On optimal simplicial 3D meshes for minimizing the Hessian-based errors , 2016 .

[46]  Víctor López,et al.  MPI+X: task-based parallelisation and dynamic load balance of finite element assembly , 2018, International Journal of Computational Fluid Dynamics.

[47]  Hugues Digonnet,et al.  Mesh partitioning for parallel computational fluid dynamics applications on a grid , 2005 .

[48]  Eric Petit,et al.  Divide and Conquer Parallelization of Finite Element Method Assembly , 2013, PARCO.

[49]  Asif Afzal,et al.  Parallelization Strategies for Computational Fluid Dynamics Software: State of the Art Review , 2016, Archives of Computational Methods in Engineering.

[50]  Jesús Labarta,et al.  LeWI: A Runtime Balancing Algorithm for Nested Parallelism , 2009, 2009 International Conference on Parallel Processing.

[51]  Guillaume Houzeaux,et al.  Some Useful Strategies for Unstructured Edge-Based Solvers on Shared Memory Machines , 2011 .

[52]  Brian Cabral,et al.  Imaging vector fields using line integral convolution , 1993, SIGGRAPH.

[53]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[54]  Jeff P. Hultquist,et al.  Constructing stream surfaces in steady 3D vector fields , 1992, Proceedings Visualization '92.

[55]  Frédéric Magoulès,et al.  Asynchronous optimized Schwarz methods with and without overlap , 2017, Numerische Mathematik.

[56]  O. C. Zienkiewicz,et al.  A simple error estimator and adaptive procedure for practical engineerng analysis , 1987 .