Deep Autoencoders for Additional Insight into Protein Dynamics

The study of protein dynamics through analysis of conformational transitions represents a significant stage in understanding protein function. Using molecular simulations, large samples of protein transitions can be recorded. However, extracting functional motions from these samples is still not automated and extremely time-consuming. In this paper we investigate the usefulness of unsupervised machine learning methods for uncovering relevant information about protein functional dynamics. Autoencoders are being explored in order to highlight their ability to learn relevant biological patterns, such as structural characteristics. This study is aimed to provide a better comprehension of how protein conformational transitions are evolving in time, within the larger framework of automatically detecting functional motions.

[1]  Graziano Pesole,et al.  CLEANUP: a fast computer program for removing redundancies from nucleotide sequence databases , 1996, Comput. Appl. Biosci..

[2]  Kuldip K. Paliwal,et al.  Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto‐encoder deep neural network , 2014, J. Comput. Chem..

[3]  Liu Yuan,et al.  Predicting protein structural classes with autoencoder neural networks , 2013, 2013 25th Chinese Control and Decision Conference (CCDC).

[4]  Mauro Leoncini,et al.  A High Performing Tool for Residue Solvent Accessibility Prediction , 2011, ITBAM.

[5]  Alessandro Pandini,et al.  Using Local States To Drive the Sampling of Global Conformations in Proteins , 2016, Journal of chemical theory and computation.

[6]  Modesto Orozco,et al.  MoDEL (Molecular Dynamics Extended Library): a database of atomistic molecular dynamics trajectories. , 2010, Structure.

[7]  William Stafford Noble,et al.  Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure , 2006, Bioinform..

[8]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Dianhui Wang,et al.  Extraction and Optimization of Fuzzy Protein Sequences Classification Rules Using GRBF Neural Networks , 2003 .

[10]  Somdatta Sinha,et al.  Using genomic signatures for HIV-1 sub-typing , 2010, BMC Bioinformatics.

[11]  Dan S. Tawfik,et al.  Protein Dynamism and Evolvability , 2009, Science.

[12]  Yannis Manolopoulos,et al.  Going over the three dimensional protein structure similarity problem , 2014, Artificial Intelligence Review.

[13]  Luhua Lai,et al.  Sequence-based prediction of protein protein interaction using a deep-learning algorithm , 2017, BMC Bioinformatics.

[14]  Vijay S. Pande,et al.  Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models , 2014, ICML.

[15]  Alessandro Pandini,et al.  Structural alphabets derived from attractors in conformational space , 2010, BMC Bioinformatics.

[16]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[17]  Dong Xu,et al.  DL-PRO: A novel deep learning method for protein model quality assessment , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[18]  Erik Marchi,et al.  Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[19]  Jonathan M. Garibaldi,et al.  Supervised machine learning algorithms for protein structure classification , 2009, Comput. Biol. Chem..

[20]  Giancarlo Mauri,et al.  Detecting similarities among distant homologous proteins by comparison of domain flexibilities. , 2007, Protein engineering, design & selection : PEDS.

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  Fabio Stella,et al.  Conformational and functional analysis of molecular dynamics trajectories by Self-Organising Maps , 2011, BMC Bioinformatics.

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  Ehsaneddin Asgari,et al.  Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics , 2015, PloS one.

[25]  G. Chirikjian,et al.  Efficient generation of feasible pathways for protein conformational transitions. , 2002, Biophysical journal.

[26]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.