Protein-Folding Analysis Using Features Obtained by Persistent Homology

Understanding the protein-folding process is an outstanding issue in biophysics; recent developments in molecular dynamics simulation have provided insights into this phenomenon. However, the large freedom of atomic motion hinders the understanding of this process. In this study, we applied persistent homology, an emerging method to analyze topological features in a data set, to reveal protein-folding dynamics. We developed a new, to our knowledge, method to characterize the protein structure based on persistent homology and applied this method to molecular dynamics simulations of chignolin. Using principle component analysis or nonnegative matrix factorization, our analysis method revealed two stable states and one saddle state, corresponding to the native, misfolded, and transition states, respectively. We also identified an unfolded state with slow dynamics in the reduced space. Our method serves as a promising tool to understand the protein-folding process.

[1]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[2]  L. Guibas,et al.  Topological methods for exploring low-density states in biomolecular folding pathways. , 2008, The Journal of chemical physics.

[3]  Herbert Edelsbrunner,et al.  Topological Persistence and Simplification , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[4]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[5]  Sotaro Fuchigami,et al.  Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: the case of domain motions. , 2011, The Journal of chemical physics.

[6]  Peter Bubenik,et al.  Categorification of Persistent Homology , 2012, Discret. Comput. Geom..

[7]  Ayori Mitsutake,et al.  Relaxation mode analysis and Markov state relaxation mode analysis for chignolin in aqueous solution near a transition temperature. , 2015, The Journal of chemical physics.

[8]  Lucie N. Hutchins,et al.  Position-dependent motif characterization using non-negative matrix factorization , 2008, Bioinform..

[9]  Emerson G. Escolar,et al.  Hierarchical structures of amorphous solids characterized by persistent homology , 2015, Proceedings of the National Academy of Sciences.

[10]  Renaud Gaujoux,et al.  A flexible R package for nonnegative matrix factorization , 2010, BMC Bioinformatics.

[11]  Gunnar E. Carlsson,et al.  Topological pattern recognition for point cloud data* , 2014, Acta Numerica.

[12]  D. van der Spoel,et al.  GROMACS: A message-passing parallel molecular dynamics implementation , 1995 .

[13]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[14]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  A. Mitsutake,et al.  Conformational change of a biomolecule studied by the weighted ensemble method: Use of the diffusion map method to extract reaction coordinates. , 2018, The Journal of chemical physics.

[16]  Diwakar Shukla,et al.  Markov State Models and tICA Reveal a Nonnative Folding Nucleus in Simulations of NuG2. , 2016, Biophysical journal.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Ippei Obayashi,et al.  Volume Optimal Cycle: Tightest representative cycle of a generator on persistent homology , 2017, SIAM J. Appl. Algebra Geom..

[19]  Afra Zomorodian,et al.  Computing Persistent Homology , 2004, SCG '04.

[20]  Yasuaki Hiraoka,et al.  Persistent homology analysis of craze formation. , 2017, Physical review. E.

[21]  C. Levinthal Are there pathways for protein folding , 1968 .

[22]  Kelin Xia,et al.  Persistent homology analysis of protein structure, flexibility, and folding , 2014, International journal for numerical methods in biomedical engineering.

[23]  Guo-Wei Wei,et al.  Multidimensional persistence in biomolecular data , 2014, J. Comput. Chem..

[24]  Diwakar Shukla,et al.  To milliseconds and beyond: challenges in the simulation of protein folding. , 2013, Current opinion in structural biology.

[25]  Fumio Hirata,et al.  The effects of solvent on the conformation and the collective motions of protein: normal mode analysis and molecular dynamics simulations of melittin in water and in vacuum , 1991 .

[26]  Florian Sittel,et al.  Contact- and distance-based principal component analysis of protein dynamics. , 2015, The Journal of chemical physics.

[27]  Christos Boutsidis,et al.  SVD based initialization: A head start for nonnegative matrix factorization , 2008, Pattern Recognit..

[28]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[29]  Kenji Fukumizu,et al.  Persistence weighted Gaussian kernel for topological data analysis , 2016, ICML.

[30]  Emerson G. Escolar,et al.  Optimal Cycles for Persistent Homology Via Linear Programming , 2016 .

[31]  G. Carlsson,et al.  Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival , 2011, Proceedings of the National Academy of Sciences.

[32]  Masao Kimura,et al.  Non-empirical identification of trigger sites in heterogeneous processes using persistent homology , 2018, Scientific Reports.

[33]  M. Gameiro,et al.  A topological measurement of protein compressibility , 2014, Japan Journal of Industrial and Applied Mathematics.