TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/.

[1]  Lukasz Kurgan,et al.  Sequence based prediction of relative solvent accessibility using two-stage support vector regression with confidence values , 2008 .

[2]  Avner Schlessinger,et al.  Natively Unstructured Loops Differ from Other Loops , 2007, PLoS Comput. Biol..

[3]  Lukasz A. Kurgan,et al.  Sequence based residue depth prediction using evolutionary information and predicted secondary structure , 2008, BMC Bioinformatics.

[4]  Burkhard Rost,et al.  Protein–Protein Interaction Hotspots Carved into Sequences , 2007, PLoS Comput. Biol..

[5]  Lukasz A. Kurgan,et al.  Critical assessment of high-throughput standalone methods for secondary structure prediction , 2011, Briefings Bioinform..

[6]  Ulrich H. E. Hansmann,et al.  Bioinformatics Original Paper Support Vector Machines for Prediction of Dihedral Angle Regions , 2022 .

[7]  Lukasz Kurgan,et al.  iFC2: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content , 2010, Amino Acids.

[8]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[9]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Homayoun Valafar,et al.  Tali: Local Alignment of protein Structures Using Backbone Torsion Angles , 2008, J. Bioinform. Comput. Biol..

[11]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[12]  Yaoqi Zhou,et al.  Fluctuations of backbone torsion angles obtained from NMR‐determined structures and their prediction , 2010, Proteins.

[13]  L. Kurgan,et al.  Improved Prediction of Relative Solvent Accessibility Using Two-stage Support Vector Regression , 2007, 2007 1st International Conference on Bioinformatics and Biomedical Engineering.

[14]  Jiangning Song,et al.  Predicting residue-wise contact orders in proteins by support vector regression , 2006, BMC Bioinformatics.

[15]  Sitao Wu,et al.  ANGLOR: A Composite Machine-Learning Algorithm for Protein Backbone Torsion Angle Prediction , 2008, PloS one.

[16]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[17]  Jiangning Song,et al.  Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure , 2007, Bioinform..

[18]  Robert Clarke,et al.  Multilevel support vector regression analysis to identify condition-specific regulatory networks , 2010, Bioinform..

[19]  Gajendra P. S. Raghava,et al.  Identification of Proteins Secreted by Malaria Parasite into Erythrocyte using SVM and PSSM profiles , 2008, BMC Bioinformatics.

[20]  Shandar Ahmad,et al.  Integrated prediction of one-dimensional structural features and their relationships with conformational flexibility in helical membrane proteins , 2010, BMC Bioinformatics.

[21]  Lukasz Kurgan,et al.  Improved identification of outer membrane beta barrel proteins using primary sequence, predicted secondary structure, and evolutionary information , 2011, Proteins.

[22]  Chuhsing Kate Hsiao,et al.  A new regularized least squares support vector regression for gene selection , 2009, BMC Bioinformatics.

[23]  Glennie Helles,et al.  Predicting dihedral angle probability distributions for protein coil residues from primary sequence using neural networks , 2009, BMC Bioinformatics.

[24]  Kentaro Shimizu,et al.  Potential for assessing quality of protein structure based on contact number prediction , 2006, Proteins.

[25]  Sitao Wu,et al.  MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information , 2008, Proteins.

[26]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[27]  An-Suei Yang,et al.  Protein backbone angle prediction with machine learning approaches , 2004, Bioinform..

[28]  Burkhard Rost,et al.  Prediction of DNA-binding residues from sequence , 2007, ISMB/ECCB.

[29]  Dong Xu,et al.  Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites* , 2010, Molecular & Cellular Proteomics.

[30]  Zheng Yuan,et al.  Prediction of protein B‐factor profiles , 2005, Proteins.

[31]  Wen Liu,et al.  Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models , 2006, BMC Bioinformatics.

[32]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.

[33]  Wei Zhang,et al.  SP5: Improving Protein Fold Recognition by Using Torsion Angle Profiles and Profile-Based Gap Penalty Model , 2008, PloS one.

[34]  Jiangning Song,et al.  HSEpred: predict half-sphere exposure from protein sequences , 2008, Bioinform..

[35]  Shandar Ahmad,et al.  Prediction of dinucleotide-specific RNA-binding sites in proteins , 2011, BMC Bioinformatics.

[36]  Bo Yao,et al.  EPSVR and EPMeta: prediction of antigenic epitopes using support vector regression and multiple server results , 2010, BMC Bioinformatics.

[37]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[38]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[39]  Jinyan Li,et al.  DomSVR: domain boundary prediction with support vector regression from sequence information alone , 2010, Amino Acids.

[40]  Yaoqi Zhou,et al.  Real‐SPINE: An integrated system of neural networks for real‐value prediction of protein structural properties , 2007, Proteins.

[41]  Dinesh Gupta,et al.  CyclinPred: A SVM-Based Method for Predicting Cyclin Protein Sequences , 2008, PloS one.

[42]  K. Karplus,et al.  Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry , 2003, Proteins.

[43]  S. Wodak,et al.  Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions. , 1991, Journal of molecular biology.

[44]  Yuedong Yang,et al.  Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. , 2009, Structure.

[45]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[46]  David S. Wishart,et al.  PREDITOR: a web server for predicting protein torsion angle restraints , 2006, Nucleic Acids Res..

[47]  Gajendra P. S. Raghava,et al.  Open Access Research Article Prediction of Gtp Interacting Residues, Dipeptides and Tripeptides in a Protein from Its Evolutionary Information , 2022 .

[48]  Christopher Bystroff,et al.  Improved pairwise alignment of proteins in the Twilight Zone using local structure predictions , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[49]  Geoffrey I. Webb,et al.  Prodepth: Predict Residue Depth by Support Vector Regression Approach from Protein Sequences Only , 2009, PloS one.

[50]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[51]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[52]  P. Rizkallah,et al.  Crystal structure of hyaluronidase, a major allergen of bee venom. , 2000, Structure.

[53]  Alessandro Vullo,et al.  Protein Structural Motif Prediction in Multidimensional ø-Psi Space Leads to Improved Secondary Structure Prediction , 2006, J. Comput. Biol..

[54]  Jonathan D. Hirst,et al.  Predicting β-turns and their types using predicted backbone dihedral angles and secondary structures , 2010, BMC Bioinformatics.

[55]  Zheng Yuan,et al.  Better prediction of protein contact number using a support vector regression analysis of amino acid sequence , 2005, BMC Bioinformatics.

[56]  Hong-Bin Shen,et al.  Robust prediction of B-factor profile from sequence using two-stage SVR based on random forest feature selection. , 2009, Protein and peptide letters.

[57]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[58]  Ao Li,et al.  Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme , 2006, BMC Bioinformatics.

[59]  Avner Schlessinger,et al.  Natively unstructured regions in proteins identified from contact predictions , 2007, Bioinform..

[60]  Geoffrey I. Webb,et al.  Cascleave: towards more accurate prediction of caspase substrate cleavage sites , 2010, Bioinform..

[61]  Lukasz A. Kurgan,et al.  Accurate sequence-based prediction of catalytic residues , 2008, Bioinform..

[62]  Jonathan D. Hirst,et al.  Prediction of backbone dihedral angles and protein secondary structure using support vector machines , 2009, BMC Bioinformatics.

[63]  Lukasz A. Kurgan,et al.  Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments , 2008, BMC Bioinformatics.

[64]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[65]  S. Teichmann,et al.  Tight Regulation of Unstructured Proteins: From Transcript Synthesis to Protein Degradation , 2008, Science.

[66]  Gajendra P. S. Raghava,et al.  Identification of DNA-binding proteins using support vector machines and evolutionary profiles , 2007, BMC Bioinformatics.

[67]  Zheng Yuan,et al.  Prediction of protein accessible surface areas by support vector regression , 2004, Proteins.

[68]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[69]  Jiangning Song,et al.  Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information , 2006, BMC Bioinformatics.

[70]  Lukasz Kurgan,et al.  Structural protein descriptors in 1-dimension and their sequence-based predictions. , 2011, Current protein & peptide science.

[71]  Gajendra P. S. Raghava,et al.  Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein , 2005, BMC Bioinformatics.

[72]  Bin Xue,et al.  Real‐value prediction of backbone torsion angles , 2008, Proteins.

[73]  Jagath C Rajapakse,et al.  Two‐stage support vector regression approach for predicting accessible surface areas of amino acids , 2006, Proteins.

[74]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[75]  Chao Zhang,et al.  Fold prediction of helical proteins using torsion angle dynamics and predicted restraints , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[76]  Terran Lane,et al.  A Framework for Multiple Kernel Support Vector Regression and Its Applications to siRNA Efficacy Prediction , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[77]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[78]  Jagath C Rajapakse,et al.  Prediction of protein relative solvent accessibility with a two‐stage SVM approach , 2005, Proteins.

[79]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..

[80]  Kevin Burrage,et al.  Prediction of protein solvent accessibility using support vector machines , 2002, Proteins.

[81]  Lilia M. Iakoucheva,et al.  Intrinsic Disorder Is a Common Feature of Hub Proteins from Four Eukaryotic Interactomes , 2006, PLoS Comput. Biol..

[82]  Avner Schlessinger,et al.  Improved Disorder Prediction by Combination of Orthogonal Approaches , 2009, PloS one.

[83]  J. Hirst,et al.  Protein secondary structure prediction with dihedral angles , 2005, Proteins.

[84]  B. Lee,et al.  Estimation and use of protein backbone angle probabilities. , 1993, Journal of molecular biology.

[85]  E. Pai,et al.  The structure of enzyme IIAlactose from Lactococcus lactis reveals a new fold and points to possible interactions of a multicomponent system. , 1997, Structure.

[86]  P B Sigler,et al.  A molecular mechanism for the phosphorylation-dependent regulation of heterotrimeric G proteins by phosducin. , 1999, Molecular cell.

[87]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.