Protein Disorder Prediction using Jumping Motifs from Torsion Angles Dynamics in Ramachandran Plots

Disordered proteins are functional proteins that do not fold in a fixed 3D structure. The order/disorder prediction in protein sequences is an important task given the biological roles of disordered proteins. In the last decade many computational based methods have been proposed for the disorder identification but currently the most accurate strategies depend on the sequence alignment of large databases of proteins, making the methods slow and hard to apply on proteome-wide analysis. In this paper is proposed an innovative approach for linking the amino acid sequences with transition tendencies in their dihedral torsion angles. The aim is to characterize the dynamical angle variations along the protein chain, as a way of measuring the flexibility of the amino acids and its connection with the disorder state. The features are estimated from empirical propensities computed from Ramachandran Plots. The classification is performed using structural learning in the form of CRF (Conditional Random Fields). The performance is evaluated in terms of AUC (Area Under the ROC Curve), and three suitable performance metrics for unbalanced classification problems. The results show that the proposed method outperforms the most referenced alignment-free predictors and its performance is also competitive with the slower and mature alignment-based methods.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Silvio C. E. Tosatto,et al.  ESpritz: accurate and fast prediction of protein disorder , 2012, Bioinform..

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[5]  Avner Schlessinger,et al.  Natively unstructured regions in proteins identified from contact predictions , 2007, Bioinform..

[6]  A Keith Dunker,et al.  SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method , 2012, Journal of biomolecular structure & dynamics.

[7]  Zoran Obradovic,et al.  DisProt: the Database of Disordered Proteins , 2006, Nucleic Acids Res..

[8]  Julián D. Arias-Londoño,et al.  Protein Disorder Prediction using Information Theory Measures on the Distribution of the Dihedral Torsion Angles from Ramachandran Plots , 2017, BIOINFORMATICS.

[9]  Lukasz Kurgan,et al.  Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life , 2014, Cellular and Molecular Life Sciences.

[10]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[11]  C. Ramakrishnan,et al.  Sparsely populated residue conformations in protein structures: Revisiting “experimental” Ramachandran maps , 2014, Proteins.

[12]  Fernanda L. Sirota,et al.  Parameterization of disorder predictors for large-scale applications requiring high specificity by using an extended benchmark dataset , 2010, BMC Genomics.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  Lixiao Wang,et al.  OnD-CRF: predicting order and disorder in proteins conditional random fields , 2008, Bioinform..

[15]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[16]  Roland L. Dunbrack,et al.  PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. , 2010, Biochimica et biophysica acta.

[17]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[18]  Vladimir N. Uversky,et al.  Order, Disorder, and Everything in Between , 2016, Molecules.

[19]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[20]  Silvio C. E. Tosatto,et al.  MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins , 2014, Nucleic Acids Res..

[21]  P. Karplus,et al.  A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins , 2010, Biomolecular concepts.

[22]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[23]  Lukasz A. Kurgan,et al.  D2P2: database of disordered protein predictions , 2012, Nucleic Acids Res..

[24]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[25]  Sonia Longhi,et al.  BMC Genomics , 2003 .

[26]  P. Tompa,et al.  Computational approaches for inferring the functions of intrinsically disordered proteins , 2015, Front. Mol. Biosci..

[27]  Roland Eils,et al.  Using Bayesian multinomial classifier to predict whether a given protein sequence is intrinsically disordered. , 2008, Journal of theoretical biology.

[28]  P. Biswas,et al.  Conformational Entropy of Intrinsically Disordered Proteins from Amino Acid Triads , 2015, Scientific Reports.

[29]  Christopher J. Oldfield,et al.  Intrinsically disordered proteins in human diseases: introducing the D2 concept. , 2008, Annual review of biophysics.

[30]  Lukasz A. Kurgan,et al.  Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources , 2010, Bioinform..

[31]  Sophia Ananiadou,et al.  Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty , 2009, ACL.

[32]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[33]  Janusz M. Bujnicki,et al.  MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins , 2012, BMC Bioinformatics.

[34]  P. Karplus,et al.  (φ,ψ)₂ motifs: a purely conformation-based fine-grained enumeration of protein parts at the two-residue level. , 2012, Journal of molecular biology.

[35]  Jianlin Cheng,et al.  A comprehensive overview of computational protein disorder prediction methods. , 2012, Molecular bioSystems.

[36]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .