Secondary structure specific simpler prediction models for protein backbone angles

Motivation Protein backbone angle prediction has achieved significant accuracy improvement with the development of deep learning methods. Usually the same deep learning model is used in making prediction for all residues regardless of the categories of secondary structures they belong to. In this paper, we propose to train separate deep learning models for each category of secondary structures. Machine learning methods strive to achieve generality over the training examples and consequently loose accuracy. In this work, we explicitly exploit classification knowledge to restrict generalisation within the specific class of training examples. This is to compensate the loss of generalisation by exploiting specialisation knowledge in an informed way. Results The new method named SAP4SS obtains mean absolute error (MAE) values of 15.59, 18.87, 6.03, and 21.71 respectively for four types of backbone angles $$\phi$$ ϕ , $$\psi$$ ψ , $$\theta$$ θ , and $$\tau$$ τ . Consequently, SAP4SS significantly outperforms existing state-of-the-art methods SAP, OPUS-TASS, and SPOT-1D: the differences in MAE for all four types of angles are from 1.5 to 4.1% compared to the best known results. Availability SAP4SS along with its data is available from https://gitlab.com/mahnewton/sap4ss .

[1]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[2]  Jaswinder Singh,et al.  Single‐sequence‐based prediction of protein secondary structures and solvent accessibility by deep whole‐sequence learning , 2018, J. Comput. Chem..

[3]  D. Kihara The effect of long‐range interactions on the secondary structure formation of proteins , 2005, Protein science : a publication of the Protein Society.

[4]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[5]  Jianpeng Ma,et al.  OPUS-CSF: A C-atom-based Scoring Function for Ranking Protein Structural Models , 2017, bioRxiv.

[6]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[7]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[8]  Chao Fang,et al.  Prediction of Protein Backbone Torsion Angles Using Deep Residual Inception Neural Networks , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Y. Duan,et al.  Trends in template/fragment-free protein structure prediction , 2010, Theoretical chemistry accounts.

[10]  David Baltimore,et al.  Hierarchical Structure of Proteins , 2000 .

[11]  Jianpeng Ma,et al.  OPUS‐CSF: A C‐atom‐based scoring function for ranking protein structural models , 2017, bioRxiv.

[12]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[13]  V. Cutello,et al.  A multi-objective evolutionary approach to the protein structure prediction problem , 2006, Journal of The Royal Society Interface.

[14]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[15]  Yaoqi Zhou,et al.  Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks , 2018, Bioinform..

[16]  Jianpeng Ma,et al.  OPUS-TASS: a protein backbone torsion angles and secondary structure predictor based on ensemble neural networks , 2020, Bioinform..

[17]  Kuldip K. Paliwal,et al.  Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto‐encoder deep neural network , 2014, J. Comput. Chem..

[18]  V. Vapnik The Support Vector Method of Function Estimation , 1998 .

[19]  Kuldip K. Paliwal,et al.  Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks , 2018, Bioinform..

[20]  Sitao Wu,et al.  ANGLOR: A Composite Machine-Learning Algorithm for Protein Backbone Torsion Angle Prediction , 2008, PloS one.

[21]  Guoli Wang,et al.  PISCES: recent improvements to a PDB sequence culling server , 2005, Nucleic Acids Res..

[22]  Ole Winther,et al.  NetSurfP‐2.0: Improved prediction of protein structural features by integrated deep learning , 2019, Proteins.

[23]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[24]  Gianluca Pollastri,et al.  Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction , 2019, Scientific Reports.

[25]  Gianluca Pollastri,et al.  Porter 5: state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes , 2018, bioRxiv.

[26]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[27]  Ying Xu,et al.  A historical perspective of template-based protein structure prediction. , 2008, Methods in molecular biology.

[28]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  H A Scheraga,et al.  Minimization of polypeptide energy. I. Preliminary structures of bovine pancreatic ribonuclease S-peptide. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Jianpeng Ma,et al.  OPUS-DOSP: A Distance- and Orientation-Dependent All-Atom Potential Derived from Side-Chain Packing. , 2017, Journal of molecular biology.

[32]  Sheng Wang,et al.  RaptorX-Angle: real-value prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning , 2018, BMC Bioinformatics.

[33]  Juergen Haas,et al.  The Protein Model Portal—a comprehensive resource for protein structure and model information , 2013, Database J. Biol. Databases Curation.

[34]  Abdollah Dehzangi,et al.  Enhancing protein backbone angle prediction by using simpler models of deep neural networks , 2020, Scientific Reports.

[35]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[36]  Chao Fang,et al.  Applications of deep neural networks to protein structure prediction , 2018 .

[37]  Arne Elofsson,et al.  A Bi-LSTM Based Ensemble Algorithm for Prediction of Protein Secondary Structure , 2019 .