A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning

Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.

[1]  Renzhi Cao,et al.  SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines , 2013, BMC Bioinformatics.

[2]  Tom Lenaerts,et al.  Reconstruction of Protein Backbones from the BriX Collection of Canonical Protein Fragments , 2008, PLoS Comput. Biol..

[3]  Yang Zhang,et al.  A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction , 2013, Scientific Reports.

[4]  Dominik Gront,et al.  Assessing the accuracy of template-based structure prediction metaservers by comparison with structural genomics structures , 2012, Journal of Structural and Functional Genomics.

[5]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[6]  Renzhi Cao,et al.  Protein single-model quality assessment by feature-based probability density functions , 2016, Scientific Reports.

[7]  Taeho Jo,et al.  Improving Protein Fold Recognition by Deep Learning Networks , 2015, Scientific Reports.

[8]  Jie Hou,et al.  DeepQA: improving the estimation of single protein model quality with deep belief networks , 2016, BMC Bioinformatics.

[9]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[10]  Renzhi Cao,et al.  UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling , 2016, Bioinform..

[11]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[12]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[13]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[14]  Lei Xie,et al.  Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling , 2003, Proteins.

[15]  Jilong Li,et al.  Large-scale model quality assessment for improving protein tertiary structure prediction , 2015, Bioinform..

[16]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[17]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[18]  Jianlin Cheng,et al.  A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Yang Zhang,et al.  3DRobot: automated generation of diverse and well-packed protein structure decoys , 2016, Bioinform..

[20]  Richard M. Jackson,et al.  An evaluation of automated homology modelling methods at low target-template sequence similarity , 2007, Bioinform..

[21]  Yang Zhang,et al.  A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction , 2010, PloS one.

[22]  Peter A. Flach,et al.  Proceedings of the 28th International Conference on Machine Learning , 2011 .

[23]  Yu Xue,et al.  Deep Conditional Random Field Approach to Transmembrane Topology Prediction and Application to GPCR Three-Dimensional Structure Modeling , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Jilong Li,et al.  A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment in CASP11 , 2015, BMC Bioinformatics.

[25]  Jeffrey Skolnick,et al.  Fast procedure for reconstruction of full‐atom protein models from reduced representations , 2008, J. Comput. Chem..

[26]  Manuel C. Peitsch,et al.  SWISS-MODEL: an automated protein homology-modeling server , 2003, Nucleic Acids Res..

[27]  P. Zielenkiewicz,et al.  Why similar protein sequences encode similar three-dimensional structures? , 2010 .

[28]  Xubiao Peng,et al.  A three dimensional visualisation approach to protein heavy-atom structure reconstruction , 2014, BMC Structural Biology.

[29]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[30]  Ram Samudrala,et al.  GPU-Q-J, a fast method for calculating root mean square deviation (RMSD) after optimal superposition , 2011, BMC Research Notes.

[31]  A. Sali,et al.  Modeller: generation and refinement of homology-based protein structure models. , 2003, Methods in enzymology.

[32]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.