Initial results on computational performance of Intel many integrated core, sandy bridge, and graphical processing unit architectures: implementation of a 1D c++/OpenMP electrostatic particle‐in‐cell code

We present initial comparison performance results for Intel many integrated core (MIC), Sandy Bridge (SB), and graphical processing unit (GPU). A 1D explicit electrostatic particle‐in‐cell code is used to simulate a two‐stream instability in plasma. We compare the computation times for various number of cores/threads and compiler options. The parallelization is implemented via OpenMP with a maximum thread number of 128. Parallelization and vectorization on the GPU is achieved with modifying the code syntax for compatibility with CUDA. We assess the speedup due to various auto‐vectorization and optimization level compiler options. Our results show that the MIC is several times slower than SB for a single thread, and it becomes faster than SB when the number of cores increases with vectorization switched on. The compute times for the GPU are consistently about six to seven times faster than the ones for MIC. Compared with SB, the GPU is about two times faster for a single thread and about an order of magnitude faster for 128 threads. The net speedup, however, for MIC and GPU are almost the same. An initial attempt to offload parts of the code to the MIC coprocessor shows that there is an optimal number of threads where the speedup reaches a maximum. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Dirk Schmidl,et al.  Data and thread affinity in openmp programs , 2008, MAW '08.

[2]  Henk Sips,et al.  Parallel and Distributed Systems Report Series Benchmarking Intel Xeon Phi to Guide Kernel Design Information about Parallel and Distributed Systems Report Series: Benchmarking Intel Xeon Phi to Guide Kernel Designwp , 2022 .

[3]  Stefano Markidis,et al.  Implementation and performance of a particle-in-cell code written in Java: Research Articles , 2005 .

[4]  Matthias S. Müller,et al.  Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[5]  R. C. Davidson Theory of nonneutral plasmas , 1974 .

[6]  Rudolf Eigenmann,et al.  Optimizing OpenMP Programs on Software Distributed Shared Memory Systems , 2004, International Journal of Parallel Programming.

[7]  Stefano Markidis,et al.  Multi-scale simulations of plasma with iPIC3D , 2010, Math. Comput. Simul..

[8]  G. Lapenta,et al.  Achieving fast reconnection in resistive MHD models via turbulent means , 2011, 1110.0089.

[9]  D. R. Nicholson Introduction to Plasma Theory , 1983 .

[10]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[11]  D. Birchall,et al.  Computational Fluid Dynamics , 2020, Radial Flow Turbocompressors.

[12]  J. Dawson Particle simulation of plasmas , 1983 .

[13]  Georg Hager,et al.  Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[14]  M. Ashour‐Abdalla,et al.  The Geospace Environment Modeling Grand Challenge: Results from a Global Geospace Circulation Model , 1998 .

[15]  Frank Mueller,et al.  Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks , 2007, IEEE Transactions on Parallel and Distributed Systems.

[16]  T. Chung Computational Fluid Dynamics: FOUR. AUTOMATIC GRID GENERATION, ADAPTIVE METHODS, AND COMPUTING TECHNIQUES , 2002 .

[17]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[18]  J. M. Bull,et al.  Measuring Synchronisation and Scheduling Overheads in OpenMP , 2007 .

[19]  Stefano Markidis,et al.  Implementation and performance of a particle‐in‐cell code written in Java , 2005, Concurr. Pract. Exp..

[20]  R W Hockney,et al.  Computer Simulation Using Particles , 1966 .

[21]  C. Briand,et al.  Coherent electric structures: Vlasov-Ampère simulations and observational consequences , 2008 .

[22]  A. Nagy,et al.  Kinetic model of the ring current‐atmosphere interactions , 1997 .

[23]  F. Califano,et al.  Nonlinear kinetic regime of the Weibel instability in an electron–ion plasma , 2002 .