Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

[1]  Keehyoung Joo,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS SANN: Solvent accessibility prediction of proteins , 2022 .

[2]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[3]  Bernard F. Buxton,et al.  The DISOPRED server for the prediction of protein disorder , 2004, Bioinform..

[4]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[5]  Zhiyong Wang,et al.  MRFalign: Protein Homology Detection through Alignment of Markov Random Fields , 2014, PLoS Comput. Biol..

[6]  Jian Zhou,et al.  Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction , 2014, ICML.

[7]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[8]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[9]  Christian Cole,et al.  JPred4: a protein secondary structure prediction server , 2015, Nucleic Acids Res..

[10]  Inessa G. Im Predicting protein secondary structure using Markov chain Monte-Carlo simulation , 2008 .

[11]  Wei-Mou Zheng The Use of a Conformational Alphabet for Fast Alignment of Protein Structures , 2008, ISBRA.

[12]  Wei Chu,et al.  A graphical model for protein secondary structure prediction , 2004, ICML.

[13]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[14]  Michael L. Littman,et al.  Proceedings of the 26th Annual International Conference on Machine Learning , 2009, ICML 2009.

[15]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[16]  Kuang Lin,et al.  A simple and fast secondary structure prediction method using hidden neural networks , 2005, Bioinform..

[17]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[18]  Zheng Wei-Mou,et al.  Fast Multiple Alignment of Protein Structures Using Conformational Letter Blocks , 2009 .

[19]  Sheng Wang,et al.  ClEPaps: Fast Pair Alignment of protein Structures Based on conformational Letters , 2007, J. Bioinform. Comput. Biol..

[20]  Jian Peng,et al.  Conditional Neural Fields , 2009, NIPS.

[21]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[22]  Terrence G. Oas,et al.  Preorganized secondary structure as an important determinant of fast protein folding , 2001, Nature Structural Biology.

[23]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[24]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[25]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[26]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[27]  K. Dill,et al.  The Protein-Folding Problem, 50 Years On , 2012, Science.

[28]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[29]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[30]  K. Nishikawa,et al.  Predicting absolute contact numbers of native protein structure from amino acid sequence , 2004, Proteins.

[31]  Jian Peng,et al.  A conditional neural fields model for protein threading , 2012, Bioinform..

[32]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[33]  Zhiyong Wang,et al.  Protein 8-class secondary structure prediction using Conditional Neural Fields , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[34]  T L Blundell,et al.  Protein structure--based drug design. , 1994, Annual review of biophysics and biomolecular structure.

[35]  L. Pauling,et al.  The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Yücel Altunbasak,et al.  Protein secondary structure prediction for a single-sequence using hidden semi-Markov models , 2006, BMC Bioinformatics.

[37]  Jian Peng,et al.  Template-based protein structure modeling using the RaptorX web server , 2012, Nature Protocols.

[38]  Gregory A.Petsko and Dagmar Ringe Protein structure and function , 2003 .

[39]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[40]  Alexey G. Murzin,et al.  SCOP2 prototype: a new approach to protein structure mining , 2014, Nucleic Acids Res..

[41]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[42]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[43]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[44]  Jinbo Xu,et al.  A position-specific distance-dependent statistical potential for protein structure and functional study. , 2012, Structure.

[45]  Geoffrey J. Barton,et al.  JPred : a consensus secondary structure prediction server , 1999 .

[46]  Hu Chen,et al.  A novel method for protein secondary structure prediction using dual‐layer SVM and profiles , 2004, Proteins.

[47]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[48]  Joanna Schaffhausen Advances in structure-based drug design. , 2012, Trends in pharmacological sciences.

[49]  Albert Y. Zomaya,et al.  Machine Learning Techniques for Protein Secondary Structure Prediction:An Overview and Evaluation , 2008 .

[50]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[51]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[52]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[53]  Bernard F. Buxton,et al.  Secondary structure prediction with support vector machines , 2003, Bioinform..

[54]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[55]  Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011 , 2011, AISTATS.

[56]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[57]  P. Dobson,et al.  Length preferences and periodicity in β‐strands. Antiparallel edge β‐sheets are more likely to finish in non‐hydrogen bonded rings , 2003 .

[58]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[59]  Satoru Hayamizu,et al.  Prediction of protein secondary structure by the hidden Markov model , 1993, Comput. Appl. Biosci..

[60]  Douglas L. Brutlag,et al.  Bayesian Segmentation of Protein Secondary Structure , 2000, J. Comput. Biol..

[61]  Jianzhu Ma,et al.  Protein structure alignment beyond spatial proximity , 2013, Scientific Reports.

[62]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[63]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[64]  Feng Zhao,et al.  Protein threading using context-specific alignment potential , 2013, Bioinform..

[65]  Lubert Stryer,et al.  Protein structure and function , 2005, Experientia.

[66]  Anna Tramontano,et al.  Assessment of the assessment: Evaluation of the model quality estimates in CASP10 , 2014, Proteins.

[67]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[68]  Max Welling,et al.  Hidden-Unit Conditional Random Fields , 2011, AISTATS.

[69]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[70]  Pascal Benkert,et al.  QMEAN server for protein model quality estimation , 2009, Nucleic Acids Res..

[71]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[72]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[73]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[74]  David A. Lee,et al.  CATH: comprehensive structural and functional annotations for genome sequences , 2014, Nucleic Acids Res..

[75]  S Brunak,et al.  Protein secondary structure: category assignment and predictability , 2001, FEBS letters.

[76]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[77]  Pierre Baldi,et al.  Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data , 2005, Data Mining and Knowledge Discovery.

[78]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[79]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[80]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[81]  Jianlin Cheng,et al.  A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.