Potential of chaotic iterative solvers for CFD

Computational Fluid Dynamics (CFD) has enjoyed the speed-up available from supercomputer technology advancements for many years. In the coming decade, however, the architecture of supercomputers will change, and CFD codes must adapt to remain current. Based on the predictions of next-generation supercomputer architectures it is expected that the first computer capable of 1018 floating-point-operations-per-second (1 ExaFLOPS) will arrive in around 2020. Its architecture will be governed by electrical power limitations, whereas previously the main limitation was pure hardware speed. This has two significant repercussions. Firstly, due to physical power limitations of modern chips, core clock rates will decrease in favour of increasing concurrency. This trend can already been seen with the growth of accelerated “many-core” systems, which use graphics processing units (GPUs) or co-processors. Secondly, inter-nodal networks, typically using copper-wire or optical interconnect, must be reduced due to their proportionally large power consumption. This places more focus on shared-memory communications, with distributed-memory communication (predominantly MPI - “Message Passing Interface”) becoming less important. The current most powerful computer, Tianhe-2, capable of 33 PFlops, consists of 3,120,000 cores. The first exascale machine, which will be 30 times more powerful, is likely to be 300-times more parallel – which is a massive acceleration in parallelization compared to the last 50 years. This concurrency will come primarily from intra-node parallelization. Whereas Tianhe-2 features an already-large O(100) cores per node, an exascale machine must consist of O(1k-10k) cores per node. CFD has benefited from weak scalability (the ability to retain performance with a constant elements-per-core-ratio) for many years; its strong scalability (the ability to reduce the elements-per-core ratio) has been poor and mostly irrelevant. With the shift to massive parallelism in the next few years, the strong scalability of CFD codes must be investigated and improved. In this paper, a brief summary of earlier results is given, which identified the linear-equation system solver as one of the least-scalable parts of the code. Based on these results, a chaotic iterative solver, which is a totally-asynchronous, non-stationary, linear solver for high-scalability, is proposed. This paper focuses on the suitability of such a solver, by investigating the linear equation systems produced by typical CFD problems. If the results are optimistic, future work will be carried out to implement and test chaotic iterative solvers.

[1]  Guilherme Vaz,et al.  URANS calculations for smooth circular cylinder flow in a wide range of Reynolds Numbers: solution verification and validation , 2012 .

[2]  Guilherme Vaz,et al.  Free-Surface Viscous Flow Computations: Validation of URANS Code FreSCo , 2009 .

[3]  F. Menter,et al.  Ten Years of Industrial Experience with the SST Turbulence Model , 2003 .

[4]  L. Eça,et al.  Verification and validation for marine applications of CFD , 2013 .

[5]  Yifan Hu,et al.  Efficient, High-Quality Force-Directed Graph Drawing , 2006 .

[6]  Suak-Ho Van,et al.  Wind tunnel tests on flow characteristics of the KRISO 3,600 TEU containership and 300K VLCC double-deck ship models , 2003 .

[7]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[8]  C. Vuik,et al.  SIMPLE‐type preconditioners for cell‐centered, colocated finite volume discretization of incompressible Reynolds‐averaged Navier–Stokes equations , 2013 .

[9]  Jack J. Dongarra,et al.  A block-asynchronous relaxation method for graphics processing units , 2013, J. Parallel Distributed Comput..

[10]  Howard Jay Siegel,et al.  Reducing the synchronization overhead in parallel nonsymmetric Krylov algorithms on MIMD machines , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[11]  Simon J. Cox,et al.  Performance analysis of massively-parallel computational fluid dynamics , 2014 .

[12]  Gérard M. Baudet,et al.  Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[13]  K. Roberts,et al.  Thesis , 2002 .

[14]  Li-Tao Zhang,et al.  An improved generalized conjugate residual squared algorithm suitable for distributed parallel computing , 2014, J. Comput. Appl. Math..

[15]  A. Gorobets,et al.  A parallel MPI + OpenMP + OpenCL algorithm for hybrid supercomputations of incompressible flows , 2013 .

[16]  Jacques M. Bahi,et al.  Asynchronous Iterative Algorithms for Nonexpansive Linear Systems , 2000, J. Parallel Distributed Comput..

[17]  D. Sorensen,et al.  4. The Implicitly Restarted Arnoldi Method , 1998 .

[18]  Fred Wubs,et al.  On novel simulation methods for complex flows in maritime applications , 2012 .

[19]  William Gropp,et al.  High-performance parallel implicit CFD , 2001, Parallel Comput..

[20]  Luís Eça,et al.  ON THE ORDER OF GRID CONVERGENCE OF THE HYBRID CONVECTION SCHEME FOR RANS CODES , 2013 .

[21]  M. Hoekstra,et al.  The partial cavity on a 2D foil revisited , 2009 .