A Fast Scalable Implicit Solver with Concentrated Computation for Nonlinear Time-Evolution Problems on Low-Order Unstructured Finite Elements

Many supercomputers are shifting to architectures with low B (byte/s; memory transfer capability) per F (FLOPS capability) ratios. However, utilizing increased F is difficult for applications that inherently require large B. Targeting an implicit unstructured low-order finite-element analysis solver, which typically requires large B, we have developed a concentrated computation algorithm that yields significant performance improvements on low B/F supercomputers. 35.7% peak performance was achieved for a sparse matrix-vector multiplication kernel, and 15.6% peak performance was achieved for the whole solver on the second generation Xeon Phi-based Oakforest-PACS. This is 5.02 times faster than (and 6.90 times the peak performance of) the state-of-the-art solver (the SC14 Gordon Bell finalist solver). On Oakforest-PACS, the proposed solver was approximately 2.42 times faster than the state-of-the-art solver running on the K computer. The proposed approach has implications for systems and applications and is expected to have significant impact on various fields that use finite-element methods for nonlinear time evolution problems.

[1]  Constantine Bekas,et al.  An extreme-scale implicit solver for complex PDEs: highly heterogeneous flow in earth's mantle , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  O. C. Zienkiewicz,et al.  The Finite Element Method for Solid and Structural Mechanics , 2013 .

[3]  Thomas J. R. Hughes,et al.  Solution algorithms for nonlinear transient heat conduction analysis employing element-by-element iterative strategies , 1985 .

[4]  Pher Errol Balde Quinay,et al.  Implicit nonlinear wave simulation with 1.08T DOF and 0.270T unstructured finite elements to enhance comprehensive earthquake simulation , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Pradeep Dubey,et al.  Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[7]  G. Masing,et al.  Eigenspannungen und Verfestigung beim Messing , 1926 .

[8]  Edmond Chow,et al.  Improving the Performance of Dynamical Simulations Via Multiple Right-Hand Sides , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[9]  Fumiyoshi Shoji,et al.  Overview of the K computer System , 2012 .

[10]  Tsuyoshi Ichimura,et al.  Physics-Based Urban Earthquake Simulation Enhanced by 10.7 BlnDOF × 30 K Time-Step Unstructured FE Non-Linear Seismic Wave Simulation , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Gene H. Golub,et al.  Inexact Preconditioned Conjugate Gradient Method with Inner-Outer Iteration , 1999, SIAM J. Sci. Comput..

[12]  Xu Ping,et al.  10M-Core Scalable Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics , 2016 .

[13]  Izzat M. Idriss,et al.  NONLINEAR BEHAVIOR OF SOFT CLAYS DURING CYCLIC LOADING , 1978 .

[14]  Tsuyoshi Ichimura,et al.  Octree-Based Multiple-Material Parallel Unstructured Mesh Generation Method for Seismic Response Analysis of Soil-Structure Systems , 2016, ICCS.