A New Deep Neighbor Residual Network for Protein Secondary Structure Prediction

A protein secondary structure defines the local conformation of the protein's polypeptide backbone, which provides important information for protein 3D structure prediction and protein functions. In this study, a new deep neural network, the deep neighbor residual network (DeepNRN), is proposed for protein secondary structure predictions. The network takes three types of inputs, namely protein sequence features, profile features generated by PSI-BLAST, and profile features generated by HHBlits, and predicts the protein secondary structure in either one of eight states (Q8) or one of three states (Q3). The basic building block of the network, the neighbor residual unit, is designed with two types of short-cut connections that are more general and expressive than residual units in existing residual deep neural networks, yet can still be computed efficiently. In addition, the prediction result of DeepNRN can be refined by a Struct2Struct network to make the result more protein-like. Extensive experimental results on multiple widely used benchmark data sets show that the new DeepNRN-based method outperformed existing methods and obtained the best results across multiple data sets.

[1]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[2]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[3]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[4]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[5]  Alexey Drozdetskiy,et al.  JPred4: a protein secondary structure prediction server , 2015, Nucleic Acids Res..

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Christian Cole,et al.  JPred4: a protein secondary structure prediction server , 2015, Nucleic Acids Res..

[9]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[10]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Ke Zhang,et al.  Residual Networks of Residual Networks: Multilevel Residual Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Zhen Li,et al.  Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks , 2016, IJCAI.

[16]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[17]  Navdeep Jaitly,et al.  Next-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction , 2017, ArXiv.

[18]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[19]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[20]  Geoffrey E. Hinton,et al.  On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[22]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[23]  J. Skolnick,et al.  Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm , 2004, Proteins.

[24]  Yaoqi Zhou,et al.  SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. , 2005, Bioinformatics.

[25]  Kuldip K. Paliwal,et al.  Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins , 2016, Bioinform..

[26]  Yaoqi Zhou,et al.  Characterizing the existing and potential structural space of proteins by large-scale multiple loop permutations. , 2011, Journal of molecular biology.

[27]  Aleksey A. Porollo,et al.  Accurate prediction of solvent accessibility using neural networks–based regression , 2004, Proteins.

[28]  Xin Deng,et al.  MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts , 2011, BMC Bioinformatics.

[29]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[30]  Alan Wee-Chung Liew,et al.  Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines , 2016, J. Chem. Inf. Model..

[31]  A. Godzik,et al.  Computational protein function prediction: Are we making progress? , 2007, Cellular and Molecular Life Sciences.

[32]  Christopher J. Oldfield,et al.  Intrinsic disorder and functional proteomics. , 2007, Biophysical journal.

[33]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[34]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[35]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[37]  Ehsaneddin Asgari,et al.  Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics , 2015, PloS one.

[38]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.

[39]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[40]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[41]  B. Rost,et al.  Protein flexibility and rigidity predicted from sequence , 2005, Proteins.

[42]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[43]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[44]  Gert Vriend,et al.  A series of PDB related databases for everyday needs , 2010, Nucleic Acids Res..

[45]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[46]  Jian Zhou,et al.  Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction , 2014, ICML.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..