GPU-accelerated molecular dynamics clustering analysis with OpenACC

Abstract This chapter explores the use of OpenACC directives to accelerate the calculation of a so-called dissimilarity matrix, the most costly computation required for clustering analysis of molecular dynamics simulations. By the end of this chapter, the reader will have an understanding of: • Key algorithmic analysis steps that help guide decision making involved in application acceleration and parallelization and help set realistic performance targets for a successful OpenACC implementation on GPUs and other accelerators • Differences in development and maintenance of directive-based kernels and compiler autovectorization as compared with data-parallel languages such as CUDA and OpenCL, and hand-written kernels based on compiler intrinsics and the like • The step-by-step approaches taken for adaptation of an existing molecular dynamics trajectory analysis algorithm for OpenACC • Program and data structure transformations that are beneficial for performance on GPU accelerators and many-core CPUs that employ wide SIMD vector arithmetic units • Performance tuning techniques that can help OpenACC compilers and runtime systems achieve better performance on target accelerators with typical architecture characteristics

[1]  Gerald R. Kneller Comment on “Fast determination of the optimal rotational matrix for macromolecular superpositions” [J. Comp. Chem. 31, 1561 (2010)] , 2011, J. Comput. Chem..

[2]  Klaus Schulten,et al.  Challenges in protein-folding simulations , 2010 .

[3]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[4]  K. Schulten,et al.  Atomic Modeling of an Immature Retroviral Lattice Using Molecular Dynamics and Mutagenesis. , 2015, Structure.

[5]  T. Woolf,et al.  Towards the prediction of order parameters from molecular dynamics simulations in proteins. , 2012, The Journal of chemical physics.

[6]  Rafael C. Bernardi,et al.  Computational Methodologies for Real-Space Structural Refinement of Large Macromolecular Complexes. , 2016, Annual review of biophysics.

[7]  D. Theobald short communications Acta Crystallographica Section A Foundations of , 2005 .

[8]  Pu Liu,et al.  Fast determination of the optimal rotational matrix for macromolecular superpositions , 2009, J. Comput. Chem..

[9]  M. Pharr,et al.  ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).

[10]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[11]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[12]  Klaus Schulten,et al.  Atomic detail visualization of photosynthetic membranes with GPU-accelerated ray tracing , 2016, Parallel Comput..

[13]  Nicholas J. Higham,et al.  The Accuracy of Floating Point Summation , 1993, SIAM J. Sci. Comput..

[14]  Klaus Schulten,et al.  CryoEM and computer simulations reveal a novel kinase conformational switch in bacterial chemotaxis signaling , 2015, eLife.

[15]  William Kahan,et al.  Pracniques: further remarks on reducing truncation errors , 1965, CACM.

[16]  Klaus Schulten,et al.  GPU-accelerated molecular modeling coming of age. , 2010, Journal of molecular graphics & modelling.

[17]  Beatriz de la Iglesia,et al.  Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms , 2006, J. Math. Model. Algorithms.

[18]  4Pi microscopy with negligible sidelobes. , 2008 .

[19]  Pu Liu,et al.  Rapid communication reply to comment on: “Fast determination of the optimal rotational matrix for macromolecular superpositions” , 2011, J. Comput. Chem..

[20]  Peter L. Freddolino,et al.  Common structural transitions in explicit-solvent simulations of villin headpiece folding. , 2009, Biophysical journal.

[21]  Benjamin A. Himes,et al.  Cyclophilin A stabilizes the HIV-1 capsid through a novel non-canonical binding site , 2016, Nature Communications.

[22]  Ray W. Grout,et al.  Accelerated application development: The ORNL Titan experience , 2015, Comput. Electr. Eng..

[23]  K. Schulten,et al.  CheY's acetylation sites responsible for generating clockwise flagellar rotation in Escherichia coli , 2015, Molecular microbiology.

[24]  Klaus Schulten,et al.  GPU-accelerated analysis and visualization of large structures solved by molecular dynamics flexible fitting. , 2014, Faraday discussions.

[25]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[26]  Klaus Schulten,et al.  Going beyond Clustering in MD Trajectory Analysis: An Application to Villin Headpiece Folding , 2010, PloS one.

[27]  Klaus Schulten,et al.  Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics , 2013, Nature.

[28]  Rafael C. Bernardi,et al.  Molecular dynamics simulations of large macromolecular complexes. , 2015, Current opinion in structural biology.

[29]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[30]  Klaus Schulten,et al.  Petascale Tcl with NAMD, VMD, and Swift/T , 2014, 2014 First Workshop for High Performance Technical Computing in Dynamic Languages.

[31]  Celso L. Mendes,et al.  Deploying a Large Petascale System: The Blue Waters Experience , 2014, ICCS.

[32]  K. Schulten,et al.  Fibril Elongation by Aβ17–42: Kinetic Network Analysis of Hybrid-Resolution Molecular Dynamics Simulations , 2014, Journal of the American Chemical Society.

[33]  Klaus Schulten,et al.  All-Atom Molecular Dynamics of Virus Capsids as Drug Targets , 2016, The journal of physical chemistry letters.