Solving the Klein-Gordon equation using fourier spectral methods: a benchmark test for computer performance

The cubic Klein-Gordon equation is a simple but non-trivial partial differential equation whose numerical solution has the main building blocks required for the solution of many other partial differential equations. In this study, the library 2DECOMP&FFT is used in a Fourier spectral scheme to solve the Klein-Gordon equation and strong scaling of the code is examined on thirteen different machines for a problem size of 5123. The results are useful in assessing likely performance of other parallel fast Fourier transform based programs for solving partial differential equations. The problem is chosen to be large enough to solve on a workstation, yet also of interest to solve quickly on a supercomputer, in particular for parametric studies. Unlike the Linpack benchmark, a high ranking will not be obtained by simply building a bigger computer.

[1]  Lars Koesterke,et al.  Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi , 2013, 2013 42nd International Conference on Parallel Processing.

[2]  Ning Li,et al.  Parallel Spectral Numerical Methods , 2012 .

[3]  G. L. Payne,et al.  Relativistic Quantum Mechanics , 2007 .

[4]  Sayantan Sur,et al.  High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT , 2011, Computer Science - Research and Development.

[5]  Lian-Ping Wang,et al.  Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition , 2013, Parallel Comput..

[6]  Kenji Nakanishi,et al.  Invariant Manifolds and Dispersive Hamiltonian Evolution Equations , 2011 .

[7]  Sidney D. Drell,et al.  Relativistic Quantum Mechanics , 1965 .

[8]  Yifeng Chen,et al.  Large-scale FFT on GPU clusters , 2010, ICS '10.

[9]  Ning Li,et al.  2DECOMP&FFT - A Highly Scalable 2D Decomposition Library and FFT Interface , 2010 .

[10]  Endong Wang,et al.  Intel Math Kernel Library , 2014 .

[11]  Ian Foster,et al.  Parallel Spectral Transform Shallow Water Model: a runtime-tunable parallel benchmark code , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[12]  Paul N. Swarztrauber,et al.  A comparison of optimal FFTs on torus and hypercube multicomputers , 2001, Parallel Comput..

[13]  Hari Sundar,et al.  FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube , 2014, SIAM J. Sci. Comput..

[14]  Jeffrey K. Hollingsworth,et al.  Designing and auto-tuning parallel 3-D FFT for computation-communication overlap , 2014, PPoPP '14.

[15]  Darren J. Kerbyson,et al.  A Performance Model of Direct Numerical Simulation for Analyzing Large-Scale Systems , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[16]  A. Kamal Quantum Mechanics – II , 2010 .

[17]  Wilhelm Schlag,et al.  Numerical study of the blowup/global existence dichotomy for the focusing cubic nonlinear Klein–Gordon equation , 2010, 1011.2015.

[18]  Samuel Williams,et al.  Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis , 2014, PMBS@SC.

[19]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[20]  Vipin Kumar,et al.  Introduction to Parallel Computing , 1994 .

[21]  David Padua,et al.  Encyclopedia of Parallel Computing , 2011 .

[22]  Paul Rigge Numerical Solutions to the Sine-Gordon Equation , 2012 .

[23]  Michael Pippig PFFT: An Extension of FFTW to Massively Parallel Architectures , 2013, SIAM J. Sci. Comput..

[24]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[25]  James Demmel,et al.  Communication lower bounds and optimal algorithms for numerical linear algebra*† , 2014, Acta Numerica.

[26]  Zhoushun Zheng,et al.  Numerical Solution of Nonlinear Klein-Gordon Equation Using Lattice Boltzmann Method , 2011 .

[27]  Paul Rigge,et al.  Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.

[28]  Richard W. Vuduc,et al.  On the communication complexity of 3D FFTs and its implications for Exascale , 2012, ICS '12.

[29]  Truong Vinh Truong Duy,et al.  A decomposition method with minimum communication amount for parallelization of multi-dimensional FFTs , 2014, Comput. Phys. Commun..

[30]  C. W. Glass,et al.  Performance Modeling of the HPCG Benchmark , 2014, PMBS@SC.

[31]  Richard Vuduc,et al.  Prospects for scalable 3D FFTs on heterogeneous exascale systems , 2011 .

[32]  Dmitry Pekurovsky,et al.  P3DFFT: A Framework for Parallel Computations of Fourier Transforms in Three Dimensions , 2012, SIAM J. Sci. Comput..

[33]  George L.-T. Chiu,et al.  Tracking the Performance Evolution of Blue Gene Systems , 2013, ISC.

[34]  John N. Tsitsiklis,et al.  Optimal Communication Algorithms for Hypercubes , 1991, J. Parallel Distributed Comput..

[35]  Ian T. Foster,et al.  Parallel Algorithms for the Spectral Transform Method , 1997, SIAM J. Sci. Comput..

[36]  Li Yang,et al.  Efficient and accurate numerical methods for the Klein-Gordon-Schrödinger equations , 2007, J. Comput. Phys..

[37]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[38]  Pradeep Dubey,et al.  Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[39]  Zbigniew J. Czech,et al.  Introduction to Parallel Computing , 2017 .