Accelerating with 512-bit SIMD : A case study for molecular dynamics simulation on Intel's Knights Corner

Multi/many-core design combined with wide vector extension has become the mainstream of modern process architectures. Recently, Intel released Knights Corner, a many-core processor of Intel's Many Integrated Core (MIC) architecuture. Knights Corner comprises up to 62 cores, each supports 512-bit SIMD operation, that is, 8-way double precision floating-point vector operation. In this paper, to analyze the practical effect of the 512-bit SIMD extension, we port a molecular dynamics application onto the Knights Corner using SIMD intrinsics and then adopt optimizations such as loop unrolling and data prefetching. The experimental results demonstrate that our 512-bit SIMD implementation can achieve nearly ideal SIMD speedups (up to 7.69) over the non-SIMD version for the force computation task.

[1]  Zhu Chuan-qi Partial Reuse of the Vector Registers in SIMD Optimization , 2007 .

[2]  D. Wolff,et al.  Tabulated potentials in molecular dynamics simulations , 1999 .

[3]  Franz Franchetti,et al.  Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets , 2011, ICS '11.

[4]  Canqun Yang,et al.  A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[5]  Wolfgang Lehner,et al.  Fast Sorted-Set Intersection using SIMD Instructions , 2011, ADMS@VLDB.

[6]  Roy H. Stogner,et al.  Early Experiences Porting Scientific Applications to the Many Integrated Core ( MIC ) Platform , 2012 .

[7]  Martin Schoen,et al.  Structure of a simple molecular dynamics FORTRAN program optimized for CRAY vector processing computers , 1989 .

[8]  Canqun Yang,et al.  GPU Acceleration of High-Speed Collision Molecular Dynamics Simulation , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[9]  Efraim Rotem,et al.  Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.

[10]  Weiqiang Wang,et al.  Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters , 2011, The Journal of Supercomputing.

[11]  Guido Germano,et al.  Efficiency of linked cell algorithms , 2010, Comput. Phys. Commun..

[12]  Foiles,et al.  Embedded-atom-method functions for the fcc metals Cu, Ag, Au, Ni, Pd, Pt, and their alloys. , 1986, Physical review. B, Condensed matter.

[13]  Pawel Gepner,et al.  Early performance evaluation of AVX for HPC , 2011, ICCS.

[14]  Tao Tang,et al.  Fast parallel cutoff pair interactions for molecular dynamics on heterogeneous systems , 2012 .

[15]  M.G.B. Drew,et al.  The art of molecular dynamics simulation , 1996 .

[16]  Alexander Heinecke,et al.  An efficient vectorization of linked-cell particle simulations , 2012, CF '12.