Principal component analysis on a torus: Theory and application to protein dynamics.

A dimensionality reduction method for high-dimensional circular data is developed, which is based on a principal component analysis (PCA) of data points on a torus. Adopting a geometrical view of PCA, various distance measures on a torus are introduced and the associated problem of projecting data onto the principal subspaces is discussed. The main idea is that the (periodicity-induced) projection error can be minimized by transforming the data such that the maximal gap of the sampling is shifted to the periodic boundary. In a second step, the covariance matrix and its eigendecomposition can be computed in a standard manner. Adopting molecular dynamics simulations of two well-established biomolecular systems (Aib9 and villin headpiece), the potential of the method to analyze the dynamics of backbone dihedral angles is demonstrated. The new approach allows for a robust and well-defined construction of metastable states and provides low-dimensional reaction coordinates that accurately describe the free energy landscape. Moreover, it offers a direct interpretation of covariances and principal components in terms of the angular variables. Apart from its application to PCA, the method of maximal gap shifting is general and can be applied to any other dimensionality reduction method for circular data.

[1]  Karen Sargsyan,et al.  Clustangles: An Open Library for Clustering Angular Data , 2015, J. Chem. Inf. Model..

[2]  V. Hornak,et al.  Comparison of multiple Amber force fields and development of improved protein backbone parameters , 2006, Proteins.

[3]  C. R. Watts,et al.  Structural properties of amyloid β(1‐40) dimer explored by replica exchange molecular dynamics simulations , 2017, Proteins.

[4]  B. L. de Groot,et al.  Essential dynamics of reversible peptide folding: memory-free conformational dynamics governed by internal hydrogen bonds. , 2001, Journal of molecular biology.

[5]  Gerhard Stock,et al.  Hierarchical folding free energy landscape of HP35 revealed by most probable path clustering. , 2014, The journal of physical chemistry. B.

[6]  Oliver F. Lange,et al.  Generalized correlation for biomolecular dynamics , 2005, Proteins.

[7]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[8]  Angel E García,et al.  Free-energy landscape of a hyperstable RNA tetraloop , 2016, Proceedings of the National Academy of Sciences.

[9]  Peter Benner,et al.  Dimension Reduction of Large-Scale Systems , 2005 .

[10]  Gerhard Kurz,et al.  Recursive nonlinear filtering for angular data based on circular distributions , 2013, 2013 American Control Conference.

[11]  G. Hummer,et al.  Optimized molecular dynamics force fields applied to the helix-coil transition of polypeptides. , 2009, The journal of physical chemistry. B.

[12]  Gregory A. Voth,et al.  Cations Stiffen Actin Filaments by Adhering a Key Structural Element to Adjacent Subunits , 2016, The journal of physical chemistry. B.

[13]  R. Dror,et al.  Improved side-chain torsion potentials for the Amber ff99SB protein force field , 2010, Proteins.

[14]  C. Clementi,et al.  Discovering mountain passes via torchlight: methods for the definition of reaction coordinates and pathways in complex macromolecular reactions. , 2013, Annual review of physical chemistry.

[15]  Anahita Nodehi,et al.  Dihedral angles principal geodesic analysis using nonlinear statistics , 2015 .

[16]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[17]  Gerhard Stock,et al.  Free-energy landscape of RNA hairpins constructed via dihedral angle principal component analysis. , 2009, The journal of physical chemistry. B.

[18]  A. Liwo,et al.  Principal component analysis for protein folding dynamics. , 2009, Journal of molecular biology.

[19]  R. Hegger,et al.  Dihedral angle principal component analysis of molecular dynamics simulations. , 2007, The Journal of chemical physics.

[20]  H. Berendsen,et al.  Essential dynamics of proteins , 1993, Proteins.

[21]  Diwakar Shukla,et al.  Markov State Models Provide Insights into Dynamic Modulation of Protein Function , 2015, Accounts of chemical research.

[22]  J. Onuchic,et al.  Theory of Protein Folding This Review Comes from a Themed Issue on Folding and Binding Edited Basic Concepts Perfect Funnel Landscapes and Common Features of Folding Mechanisms , 2022 .

[23]  P. Nguyen,et al.  Energy landscape of a small peptide revealed by dihedral angle principal component analysis , 2004, Proteins.

[24]  Gerhard Stock,et al.  Construction of the free energy landscape of biomolecules via dihedral angle principal component analysis. , 2008, The Journal of chemical physics.

[25]  Toni Giorgino,et al.  Identification of slow molecular order parameters for Markov model construction. , 2013, The Journal of chemical physics.

[26]  Konrad Hinsen,et al.  Comment on: “Energy landscape of a small peptide revealed by dihedral angle principal component analysis” , 2006, Proteins.

[27]  Kresten Lindorff-Larsen,et al.  Protein folding kinetics and thermodynamics from atomistic simulation , 2012, Proceedings of the National Academy of Sciences.

[28]  Gerhard Stock,et al.  Hidden Complexity of Protein Free-Energy Landscapes Revealed by Principal Component Analysis by Parts , 2010 .

[29]  Andrea Amadei,et al.  A comparison of techniques for calculating protein essential dynamics , 1997, J. Comput. Chem..

[30]  P. Wolynes,et al.  The energy landscapes and motions of proteins. , 1991, Science.

[31]  Stephan Huckemann,et al.  Principal component analysis for Riemannian manifolds, with an application to triangular shape spaces , 2006, Advances in Applied Probability.

[32]  H. Berendsen,et al.  A comparison of techniques for calculating protein essential dynamics , 1997 .

[33]  Gerhard Stock,et al.  Identifying Metastable States of Folding Proteins. , 2012, Journal of chemical theory and computation.

[34]  Wilfred F. van Gunsteren,et al.  A molecular dynamics simulation study of chloroform , 1994 .

[35]  Gerhard Stock,et al.  Hierarchical Biomolecular Dynamics: Picosecond Hydrogen Bonding Regulates Microsecond Conformational Transitions. , 2015, Journal of chemical theory and computation.

[36]  Davit A Potoyan,et al.  Energy landscape analyses of disordered histone tails reveal special organization of their conformational dynamics. , 2011, Journal of the American Chemical Society.

[37]  Gerrit Groenhof,et al.  GROMACS: Fast, flexible, and free , 2005, J. Comput. Chem..

[38]  Florian Sittel,et al.  Robust Density-Based Clustering To Identify Metastable Conformational States of Proteins. , 2016, Journal of chemical theory and computation.

[39]  Ian W. Davis,et al.  Structure validation by Cα geometry: ϕ,ψ and Cβ deviation , 2003, Proteins.

[40]  Phuong Nguyen,et al.  Reply to the comment on “Energy landscape of a small peptide revealed by dihedral angle principal component analysis” , 2006 .

[41]  GeoPCA: a new tool for multivariate analysis of dihedral angles based on principal component geodesics , 2012, Nucleic acids research.

[42]  Frank Noé,et al.  Markov models of molecular kinetics: generation and validation. , 2011, The Journal of chemical physics.

[43]  K. Dill,et al.  From Levinthal to pathways to funnels , 1997, Nature Structural Biology.