Performance improvement of the general-purpose CFD code FrontFlow/blue on the K computer

The general-purpose fluid simulation software FrontFlow/blue (FFB) is based on the finite element method (FEM). It was designed to accept extremely large-scale simulations and is an important application in the manufacturing field in Japan. Moreover, since this application is significant in both the manufacturing field and the development of the post-K supercomputer, it is employed as an important application for the new post-K supercomputer that is under development. The K computer is still the important infrastructure in Japan. And there are some supercomputers having the same architecture to the K computer. Therefore we continue to improve the performance of the FFB on the K computer. On significant subroutines, several improvement techniques, store order based loop modification decreasing total load and store operations, unrolled loop rerolling to employ SIMD load instruction, adjusting number of arrays in loop, using sector cache function, and so on, were employed. As a result, an improvement of 160% was obtained on a single CPU performance. This paper shows and discusses the detail of these improvements.

[1]  Hiroshi Okano,et al.  Sparc64 VIIIfx: A New-Generation Octocore Processor for Petascale Computing , 2010, IEEE Micro.

[2]  A. Chorin Numerical solution of the Navier-Stokes equations , 1968 .

[3]  Inoue Shunsuke,et al.  Performance Estimation of Programs by an Extension of the RoofLine Model Considering Cache Effects , 2016 .

[4]  Taisuke Boku,et al.  Performance evaluation of ultra-large-scale first-principles electronic structure calculation code on the K computer , 2014, Int. J. High Perform. Comput. Appl..

[5]  Atsuhiro Tamura,et al.  Residual Cutting Method for Elliptic Boundary Value Problems , 1997 .

[6]  Takemasa Miyoshi,et al.  The Non-hydrostatic Icosahedral Atmospheric Model: description and development , 2014, Progress in Earth and Planetary Science.

[7]  Kazuo Minami,et al.  Kernel Performance Improvement for the FEM-Based Fluid Analysis Code on the K Computer , 2013, ICCS.

[8]  Jack Dongarra,et al.  HPCG Benchmark Technical Specification , 2013 .

[9]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[10]  Takumi Maruyama,et al.  SPARC64 XII: Fujitsu's Latest 12-Core Processor for Mission-Critical Servers , 2018, IEEE Micro.

[11]  J. Smagorinsky,et al.  GENERAL CIRCULATION EXPERIMENTS WITH THE PRIMITIVE EQUATIONS , 1963 .

[12]  Naoya Maruyama,et al.  High-performance conjugate gradient performance improvement on the K computer , 2016, Int. J. High Perform. Comput. Appl..