Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code

Abstract Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.

[1]  Scott Lathrop,et al.  State of the Practice Reports , 2011, HiPC 2011.

[2]  Scott Lathrop,et al.  Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis , 2011, International Conference on High Performance Computing.

[3]  Patricia J. Teller,et al.  Proceedings of the 2008 ACM/IEEE conference on Supercomputing , 2008, HiPC 2008.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Jens Knoop Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software , 2011 .

[6]  R. H. Barnard Handbook of Computational Fluid Mechanics Edited by R. Peyret Academic Press, 24-28 Oval Road, London NW1 7DX. 1996. 467pp. Illustrated. £80. , 1997 .

[7]  Juliane Junker,et al.  Computer Organization And Design The Hardware Software Interface , 2016 .

[8]  Pawel Gepner,et al.  Early performance evaluation of AVX for HPC , 2011, ICCS.

[9]  Firas Hamze,et al.  Importance of explicit vectorization for CPU and GPU software performance , 2010, J. Comput. Phys..

[10]  James Demmel,et al.  the Parallel Computing Landscape , 2022 .

[11]  Kevin Skadron,et al.  Proceedings 29th Annual International Symposium on Computer Architecture , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[12]  Hans De Sterck,et al.  Parallel hyperbolic PDE simulation on clusters: Cell versus GPU , 2010, Comput. Phys. Commun..

[13]  Kengo Nakajima,et al.  High Performance Computing for Computational Science - VECPAR 2012 , 2013, Lecture Notes in Computer Science.

[14]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[15]  James R. Larus,et al.  Software and the Concurrency Revolution , 2005, ACM Queue.

[16]  Sally A. McKee,et al.  Proceedings of the international conference on Supercomputing , 2011, ICS 2011.

[17]  Michel Daydé,et al.  High Performance Computing for Computational Science - VECPAR 2006, 7th International Conference, Rio de Janeiro, Brazil, June 10-13, 2006, Revised Selected and Invited Papers , 2007, VECPAR.

[18]  Samuel Williams,et al.  Performance Tuning of Scientific Applications , 2010 .

[19]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[20]  R. Peyret Handbook of Computational Fluid Mechanics , 2000 .

[21]  C. Hirsch,et al.  Numerical Computation of Internal and External Flows. By C. HIRSCH. Wiley. Vol. 1, Fundamentals of Numerical Discretization. 1988. 515 pp. £60. Vol. 2, Computational Methods for Inviscid and Viscous Flows. 1990, 691 pp. £65. , 1991, Journal of Fluid Mechanics.

[22]  Solon P. Pissis,et al.  Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum , 2013, IPDPS 2013.

[23]  M. Vavra Aero-thermodynamics and flow in turbomachines , 1960 .

[24]  Steve Keckler,et al.  Proceedings of the 36th annual international symposium on Computer architecture , 2009, ISCA 2009.