Kernel optimization for short-range molecular dynamics

Abstract To optimize short-range force computations in Molecular Dynamics (MD) simulations, multi-threading and SIMD optimizations are presented in this paper. With respect to multi-threading optimization, a Partition-and-Separate-Calculation (PSC) method is designed to avoid write conflicts caused by using Newton’s third law. Serial bottlenecks are eliminated with no additional memory usage. The method is implemented by using the OpenMP model. Furthermore, the PSC method is employed on Intel Xeon Phi coprocessors in both native and offload models. We also evaluate the performance of the PSC method under different thread affinities on the MIC architecture. In the SIMD execution, we explain the performance influence in the PSC method, considering the “ if-clause ” of the cutoff radius check. The experiment results show that our PSC method is relatively more efficient compared to some traditional methods. In double precision, our 256-bit SIMD implementation is about 3 times faster than the scalar version.

[1]  D. C. Rapaport,et al.  The Art of Molecular Dynamics Simulation , 1997 .

[2]  Foiles,et al.  Embedded-atom-method functions for the fcc metals Cu, Ag, Au, Ni, Pd, Pt, and their alloys. , 1986, Physical review. B, Condensed matter.

[3]  Michael Alexander,et al.  Proceedings of the 48th International Conference on Parallel Processing: Workshops , 2012 .

[4]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[5]  Gerhard Wellein,et al.  Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.

[6]  James Reinders,et al.  High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches , 2014 .

[7]  Stephen A. Jarvis,et al.  Exploring SIMD for Molecular Dynamics , 2013 .

[8]  M. Baskes,et al.  Embedded-atom method: Derivation and application to impurities, surfaces, and other defects in metals , 1984 .

[9]  Florian Müller-Plathe,et al.  Parallelizing a Molecular Dynamics Algorithm on a Multiprocessor Workstation Using OpenMP , 2005, J. Chem. Inf. Model..

[10]  Nisha Kurkure,et al.  Analysis of Molecular Dynamics (MD_OPENMP) on Intel® Many Integrated Core Architecture , 2012 .

[11]  Yali Liu,et al.  Efficient parallel implementation of Ewald summation in molecular dynamics simulations on multi-core platforms , 2011, Comput. Phys. Commun..

[12]  Jim Jeffers,et al.  High Performance Parallelism Pearls Volume Two: Multicore and Many-core Programming Approaches , 2015 .

[13]  Thomas Steinke,et al.  A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Jim Jeffers,et al.  High performance parallelism pearls , 2015 .

[15]  Yuzhong Shen,et al.  Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi , 2014, 2014 International Symposium on Computer Architecture and High Performance Computing Workshop.

[16]  Changjun Hu,et al.  Crystal MD: Molecular Dynamic Simulation Software for Metal with BCC Structure , 2015 .

[17]  Rajiv K. Kalia,et al.  Analysis of scalable data-privatization threading algorithms for hybrid MPI/OpenMP parallelization of molecular dynamics , 2012, The Journal of Supercomputing.

[18]  Ross C. Walker,et al.  Amber PME Molecular Dynamics Optimization , 2015 .

[19]  Steven J. Plimpton,et al.  Optimizing legacy molecular dynamics software with directive-based offload , 2015, Comput. Phys. Commun..

[20]  Yali Liu,et al.  Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms , 2009, 2009 International Conference on Parallel Processing Workshops.

[21]  Ningming Nie,et al.  Kernel Optimization on Short-Range Potentials Computations in Molecular Dynamics Simulations , 2015 .

[22]  Qian Yin,et al.  Parallelization and Optimization of Molecular Dynamics Simulation on Many Integrated Core , 2012, 2012 Eighth International Conference on Computational Intelligence and Security.

[23]  Andres More,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[24]  Arthur F. Voter,et al.  The relationship between grain boundary structure, defect mobility, and grain boundary sink efficiency , 2015, Scientific Reports.

[25]  M.G.B. Drew,et al.  The art of molecular dynamics simulation , 1996 .

[26]  Stephen A. Jarvis,et al.  Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi Coprocessors , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[27]  Berk Hess,et al.  GROMACS 3.0: a package for molecular simulation and trajectory analysis , 2001 .

[28]  Yang Wen,et al.  MOLECULAR DYNAMICS SIMULATION OF MATRIX RADIATION DAMAGE IN Fe-Cu ALLOY , 2011 .

[29]  John A. Gunnels,et al.  Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[30]  Yuzhong Shen,et al.  Performance and Energy Evaluation of CoMD on Intel Xeon Phi Co-processors , 2014, 2014 Hardware-Software Co-Design for High Performance Computing.

[31]  Eighth International Conference on Computational Intelligence and Security, CIS 2012, Guangzhou, China, November 17-18, 2012 , 2012, CIS.

[32]  Michael Klemm,et al.  From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture , 2012, Computing in Science & Engineering.

[33]  Zhenyu Liu,et al.  A Virtual Dataspaces Model for large-scale materials scientific data access , 2016, Future Gener. Comput. Syst..

[34]  Qing Wu,et al.  High-Performance Computing on the Intel® Xeon Phi™: How to Fully Exploit MIC Architectures , 2014 .