Principal component analysis: a method for determining the essential dynamics of proteins.

It has become commonplace to employ principal component analysis to reveal the most important motions in proteins. This method is more commonly known by its acronym, PCA. While most popular molecular dynamics packages inevitably provide PCA tools to analyze protein trajectories, researchers often make inferences of their results without having insight into how to make interpretations, and they are often unaware of limitations and generalizations of such analysis. Here we review best practices for applying standard PCA, describe useful variants, discuss why one may wish to make comparison studies, and describe a set of metrics that make comparisons possible. In practice, one will be forced to make inferences about the essential dynamics of a protein without having the desired amount of samples. Therefore, considerable time is spent on describing how to judge the significance of results, highlighting pitfalls. The topic of PCA is reviewed from the perspective of many practical considerations, and useful recipes are provided.

[1]  H. Berendsen,et al.  A comparison of techniques for calculating protein essential dynamics , 1997 .

[2]  K. Schulten,et al.  Principal Component Analysis and Long Time Protein Dynamics , 1996 .

[3]  Jianpeng Ma,et al.  Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. , 2005, Structure.

[4]  P. Chacón,et al.  Thorough validation of protein normal mode analysis: a comparative study with essential dynamics. , 2007, Structure.

[5]  Hendra Gunawan,et al.  A formula for angles between subspaces of inner product spaces. , 2005 .

[6]  N. Go,et al.  Harmonicity and anharmonicity in protein dynamics: A normal mode analysis and principal component analysis , 1995, Proteins.

[7]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[8]  Hans-Peter Kriegel,et al.  A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms , 2008, SSDBM.

[9]  M. Thorpe,et al.  Constrained geometric simulation of diffusive motion in proteins , 2005, Physical biology.

[10]  Charles C. David,et al.  Characterizing protein motions from structure. , 2011, Journal of molecular graphics & modelling.

[11]  P. Koev,et al.  On the largest principal angle between random subspaces , 2006 .

[12]  D. Jacobs,et al.  Protein flexibility predictions using graph theory , 2001, Proteins.

[13]  A. Carriquiry,et al.  Close correspondence between the motions from principal component analysis of multiple HIV-1 protease structures and elastic network modes. , 2008, Structure.

[14]  H J Berendsen,et al.  An efficient method for sampling the essential subspace of proteins. , 1996, Journal of biomolecular structure & dynamics.

[15]  A Kitao,et al.  Harmonic and anharmonic aspects in the dynamics of BPTI: A normal mode analysis and principal component analysis , 1994, Protein science : a publication of the Protein Society.

[16]  Charles C. David,et al.  Essential dynamics of proteins using geometrical simulations and subspace analysis , 2012 .

[17]  R. Jernigan,et al.  Anisotropy of fluctuation dynamics of proteins with an elastic network model. , 2001, Biophysical journal.

[18]  Charles C. David,et al.  Switch II mutants reveal coupling between the nucleotide- and actin-binding regions in myosin V. , 2012, Biophysical journal.

[19]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[20]  R. Brüschweiler Collective protein dynamics and nuclear spin relaxation , 1995 .

[21]  D. Zerbino,et al.  An analysis of core deformations in protein superfamilies. , 2005, Biophysical journal.

[22]  Mark Gerstein,et al.  Normal mode analysis of macromolecular motions in a database framework: Developing mode concentration as a useful classifying statistic , 2002, Proteins.

[23]  Daniel W. Farrell,et al.  Generating stereochemically acceptable protein pathways , 2010, Proteins.

[24]  S. Sapra,et al.  Robust vs. classical principalcomponent analysis in the presence of outliers , 2010 .

[25]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[26]  Horst Bischof,et al.  Fast-Robust PCA , 2009, SCIA.

[27]  H. Berendsen,et al.  Collective protein dynamics in relation to function. , 2000, Current opinion in structural biology.

[28]  Charles C. David,et al.  Kinetics and thermodynamics of the rate-limiting conformational change in the actomyosin V mechanochemical cycle. , 2011, Journal of molecular biology.

[29]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[30]  H. Kaiser,et al.  A Study Of A Measure Of Sampling Adequacy For Factor-Analytic Correlation Matrices. , 1977, Multivariate behavioral research.

[31]  H. Abdi,et al.  Principal component analysis , 2010 .

[32]  A. Amadei,et al.  On the convergence of the conformational coordinates basis set obtained by the essential dynamics analysis of proteins' molecular dynamics simulations , 1999, Proteins.

[33]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[34]  Michael J. Black,et al.  A Framework for Robust Subspace Learning , 2003, International Journal of Computer Vision.

[35]  Kim-Anh Lê Cao,et al.  Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets , 2012, BMC Bioinformatics.

[36]  Tirion,et al.  Large Amplitude Elastic Motions in Proteins from a Single-Parameter, Atomic Analysis. , 1996, Physical review letters.

[37]  R. Cattell,et al.  A Comprehensive Trial Of The Scree And Kg Criteria For Determining The Number Of Factors. , 1977, Multivariate behavioral research.

[38]  B. Hess Convergence of sampling in protein simulations. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[40]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[41]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[42]  N. Go,et al.  Investigating protein dynamics in collective coordinate space. , 1999, Current opinion in structural biology.

[43]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[44]  H. Berendsen,et al.  Essential dynamics of proteins , 1993, Proteins.

[45]  Y. Sanejouand,et al.  Conformational change of proteins arising from normal mode calculations. , 2001, Protein engineering.

[46]  Adi Ben-Israel,et al.  On principal angles between subspaces in Rn , 1992 .