Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto‐encoder deep neural network

Because a nearly constant distance between two neighbouring Cα atoms, local backbone structure of proteins can be represented accurately by the angle between Cαi−1CαiCαi+1 (θ) and a dihedral angle rotated about the CαiCαi+1 bond (τ). θ and τ angles, as the representative of structural properties of three to four amino‐acid residues, offer a description of backbone conformations that is complementary to φ and ψ angles (single residue) and secondary structures (>3 residues). Here, we report the first machine‐learning technique for sequence‐based prediction of θ and τ angles. Predicted angles based on an independent test have a mean absolute error of 9° for θ and 34° for τ with a distribution on the θ‐τ plane close to that of native values. The average root‐mean‐square distance of 10‐residue fragment structures constructed from predicted θ and τ angles is only 1.9Å from their corresponding native structures. Predicted θ and τ angles are expected to be complementary to predicted ϕ and ψ angles and secondary structures for using in model validation and template‐based as well as template‐free structure prediction. The deep neural network learning technique is available as an on‐line server called Structural Property prediction with Integrated DEep neuRal network (SPIDER) at http://sparks‐lab.org. © 2014 Wiley Periodicals, Inc.

[1]  Thierry Dutoit,et al.  Chirp group delay analysis of speech signals , 2007, Speech Commun..

[2]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[3]  Yuedong Yang,et al.  Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. , 2009, Structure.

[4]  Wei Zhang,et al.  SP5: Improving Protein Fold Recognition by Using Torsion Angle Profiles and Profile-Based Gap Penalty Model , 2008, PloS one.

[5]  Guoli Wang,et al.  PISCES: recent improvements to a PDB sequence culling server , 2005, Nucleic Acids Res..

[6]  Alessandro Vullo,et al.  Protein Structural Motif Prediction in Multidimensional ø-Psi Space Leads to Improved Secondary Structure Prediction , 2006, J. Comput. Biol..

[7]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[8]  Y. Duan,et al.  Trends in template/fragment-free protein structure prediction , 2010, Theoretical chemistry accounts.

[9]  Christopher Bystroff,et al.  Improved pairwise alignment of proteins in the Twilight Zone using local structure predictions , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[10]  Yang Zhang Protein structure prediction: when is it useful? , 2009, Current opinion in structural biology.

[11]  D. Kihara The effect of long‐range interactions on the secondary structure formation of proteins , 2005, Protein science : a publication of the Protein Society.

[12]  Ying Xu,et al.  A historical perspective of template-based protein structure prediction. , 2008, Methods in molecular biology.

[13]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[14]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[15]  J. Hirst,et al.  Protein secondary structure prediction with dihedral angles , 2005, Proteins.

[16]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[17]  J. Skolnick,et al.  Automated structure prediction of weakly homologous proteins on a genomic scale. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Janusz M Bujnicki,et al.  Protein‐Structure Prediction by Recombination of Fragments , 2006, Chembiochem : a European journal of chemical biology.

[19]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[20]  Bin Xue,et al.  Real‐value prediction of backbone torsion angles , 2008, Proteins.

[21]  Yaoqi Zhou,et al.  Real‐SPINE: An integrated system of neural networks for real‐value prediction of protein structural properties , 2007, Proteins.

[22]  J. Skolnick,et al.  TOUCHSTONE: An ab initio protein structure prediction method that uses threading-based tertiary restraints , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[24]  S L Mowbray,et al.  Cα‐based torsion angles: A simple tool to analyze protein conformational changes , 1995, Protein science : a publication of the Protein Society.

[25]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[26]  Wayne A Hendrickson,et al.  A force field for virtual atom molecular mechanics of proteins , 2009, Proceedings of the National Academy of Sciences.

[27]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[28]  Adam Liwo,et al.  Coarse-grained force field: general folding theory. , 2011, Physical chemistry chemical physics : PCCP.

[29]  Yaoqi Zhou,et al.  Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates , 2011, Bioinform..

[30]  M. Karplus,et al.  Folding thermodynamics of a model three-helix-bundle protein. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[31]  G J Kleywegt,et al.  Validation of protein models from Calpha coordinates alone. , 1997, Journal of molecular biology.

[32]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[33]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[34]  Rasmus Berg Palm,et al.  Prediction as a candidate for learning deep hierarchical models of data , 2012 .

[35]  K. Karplus,et al.  Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry , 2003, Proteins.