Single‐sequence‐based prediction of protein secondary structures and solvent accessibility by deep whole‐sequence learning

Predicting protein structure from sequence alone is challenging. Thus, the majority of methods for protein structure prediction rely on evolutionary information from multiple sequence alignments. In previous work we showed that Long Short‐Term Bidirectional Recurrent Neural Networks (LSTM‐BRNNs) improved over regular neural networks by better capturing intra‐sequence dependencies. Here we show a single‐sequence‐based prediction method employing LSTM‐BRNNs (SPIDER3‐Single), that consistently achieves Q3 accuracy of 72.5%, and correlation coefficient of 0.67 between predicted and actual solvent accessible surface area. Moreover, it yields reasonably accurate prediction of eight‐state secondary structure, main‐chain angles (backbone ϕ and ψ torsion angles and C α‐atom‐based θ and τ angles), half‐sphere exposure, and contact number. The method is more accurate than the corresponding evolutionary‐based method for proteins with few sequence homologs, and computationally efficient for large‐scale screening of protein‐structural properties. It is available as an option in the SPIDER3 server, and a standalone version for download, at http://sparks-lab.org. © 2018 Wiley Periodicals, Inc.

[1]  Kuldip K. Paliwal,et al.  Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins , 2016, Bioinform..

[2]  K-L Ting,et al.  Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence , 2002, Proteins.

[3]  Kuldip K. Paliwal,et al.  Sixty-five years of the long march in protein secondary structure prediction: the final stretch? , 2016, Briefings Bioinform..

[4]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[5]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[6]  Y. Duan,et al.  Trends in template/fragment-free protein structure prediction , 2010, Theoretical chemistry accounts.

[7]  Yaoqi Zhou,et al.  Accurate single‐sequence prediction of solvent accessible surface area using local and global features , 2014, Proteins.

[8]  Kuldip K. Paliwal,et al.  Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto‐encoder deep neural network , 2014, J. Comput. Chem..

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[11]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[12]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[13]  K. Dill,et al.  The Protein-Folding Problem, 50 Years On , 2012, Science.

[14]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[15]  Jiangning Song,et al.  HSEpred: predict half-sphere exposure from protein sequences , 2008, Bioinform..

[16]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[17]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[18]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[19]  T. Hamelryck An amino acid has two sides: A new 2D measure provides a different view of solvent exposure , 2005, Proteins.

[20]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[21]  Douglas L. Brutlag,et al.  Bayesian Segmentation of Protein Secondary Structure , 2000, J. Comput. Biol..

[22]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[23]  H A Scheraga,et al.  Minimization of polypeptide energy. I. Preliminary structures of bovine pancreatic ribonuclease S-peptide. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[24]  J. M. Levin,et al.  Exploring the limits of nearest neighbour secondary structure prediction. , 1997, Protein engineering.

[25]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[26]  Belhadri Messabih,et al.  Effect of simple ensemble methods on protein secondary structure prediction , 2015, Soft Comput..

[27]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[28]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[29]  Zheng Yuan,et al.  Better prediction of protein contact number using a support vector regression analysis of amino acid sequence , 2005, BMC Bioinformatics.

[30]  Yücel Altunbasak,et al.  Protein secondary structure prediction for a single-sequence using hidden semi-Markov models , 2006, BMC Bioinformatics.

[31]  Joarder Kamruzzaman,et al.  Combining segmental semi-Markov models with neural networks for protein secondary structure prediction , 2009, Neurocomputing.

[32]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.