CNNH_PSS: protein 8-class secondary structure prediction by convolutional neural network with highway

BackgroundProtein secondary structure is the three dimensional form of local segments of proteins and its prediction is an important problem in protein tertiary structure prediction. Developing computational approaches for protein secondary structure prediction is becoming increasingly urgent.ResultsWe present a novel deep learning based model, referred to as CNNH_PSS, by using multi-scale CNN with highway. In CNNH_PSS, any two neighbor convolutional layers have a highway to deliver information from current layer to the output of the next one to keep local contexts. As lower layers extract local context while higher layers extract long-range interdependencies, the highways between neighbor layers allow CNNH_PSS to have ability to extract both local contexts and long-range interdependencies. We evaluate CNNH_PSS on two commonly used datasets: CB6133 and CB513. CNNH_PSS outperforms the multi-scale CNN without highway by at least 0.010 Q8 accuracy and also performs better than CNF, DeepCNF and SSpro8, which cannot extract long-range interdependencies, by at least 0.020 Q8 accuracy, demonstrating that both local contexts and long-range interdependencies are indeed useful for prediction. Furthermore, CNNH_PSS also performs better than GSM and DCRNN which need extra complex model to extract long-range interdependencies. It demonstrates that CNNH_PSS not only cost less computer resource, but also achieves better predicting performance.ConclusionCNNH_PSS have ability to extracts both local contexts and long-range interdependencies by combing multi-scale CNN and highway network. The evaluations on common datasets and comparisons with state-of-the-art methods indicate that CNNH_PSS is an useful and efficient tool for protein secondary structure prediction.

[1]  Feng Zhao,et al.  Protein 8-class secondary structure prediction using Conditional Neural Fields , 2010, BIBM.

[2]  Xiaolong Wang,et al.  Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach , 2015, Journal of biomolecular structure & dynamics.

[3]  Xiaolong Wang,et al.  Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation , 2015, BMC Systems Biology.

[4]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[5]  Xiang-tao Li,et al.  Prediction of Lysine Ubiquitylation with Ensemble Classifier and Feature Selection , 2011, International journal of molecular sciences.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[8]  J A Schellman,et al.  Kaj Ulrik Linderstrøm‐Lang (1896–1959) , 1997, Protein science : a publication of the Protein Society.

[9]  Yulan He,et al.  PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context , 2016, Scientific Reports.

[10]  O. Gascue,et al.  A simple method for predicting the secondary structure of globular proteins : . . . implications and accuracy , 2022 .

[11]  Kaj Linderstrøm-Lang,et al.  Lane medical lectures : Proteins and enzymes , 2016 .

[12]  Gajendra P. S. Raghava,et al.  Identification of DNA-binding proteins using support vector machines and evolutionary profiles , 2007, BMC Bioinformatics.

[13]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[14]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[15]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[16]  Qin Lu,et al.  EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation , 2017, BMC Bioinformatics.

[17]  J. Pelton,et al.  Spectroscopic methods for analysis of protein secondary structure. , 2000, Analytical biochemistry.

[18]  David T. Jones,et al.  Improving the accuracy of transmembrane protein topology prediction using evolutionary information , 2007, Bioinform..

[19]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[20]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[21]  Hu Chen,et al.  A novel method for protein secondary structure prediction using dual‐layer SVM and profiles , 2004, Proteins.

[22]  V A Simossis,et al.  Integrating protein secondary structure prediction and multiple sequence alignment. , 2004, Current protein & peptide science.

[23]  Richard J. Simpson,et al.  Complete amino acid sequence of Embden goose (Anser anser) egg-white lysozyme , 1983 .

[24]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[25]  Herbert Gish,et al.  Speaker identification via support vector classifiers , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[26]  Ashis Kumer Biswas,et al.  Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information , 2010, BMC Bioinformatics.

[27]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[28]  L. Johnson,et al.  Protein Kinase Inhibitors: Insights into Drug Design from Structure , 2004, Science.

[29]  Yaohang Li,et al.  Context-Based Features Enhance Protein Secondary Structure Prediction Accuracy , 2014, J. Chem. Inf. Model..

[30]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[31]  Jian Zhou,et al.  Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction , 2014, ICML.

[32]  M.M. Van Hulle,et al.  View-based 3D object recognition with support vector machines , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[33]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[34]  D. Baker,et al.  Rapid protein fold determination using unassigned NMR data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Douglas L. Brutlag,et al.  Bayesian Segmentation of Protein Secondary Structure , 2000, J. Comput. Biol..

[36]  G. Raghava,et al.  Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile , 2010, Amino Acids.

[37]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[38]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[39]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[40]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[41]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[42]  Yaohang Li,et al.  Template-based C8-SCORPION: a protein 8-state secondary structure prediction method using structural information and context-based features , 2014, BMC Bioinformatics.

[43]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[44]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[45]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[46]  Zhen Li,et al.  Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks , 2016, IJCAI.

[47]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[48]  Wei Chu,et al.  A graphical model for protein secondary structure prediction , 2004, ICML.

[49]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[50]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[51]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.