Protein Disorder Prediction using Information Theory Measures on the Distribution of the Dihedral Torsion Angles from Ramachandran Plots

This paper addresses the problem of order/disorder prediction in protein sequences from alignment free methods. The proposed approach is based on a set of 11 information theory measures estimated from the distribution of the dihedral torsion angles in the amino acid chain. The aim is to characterize the energetically allowed regions for amino acids in the protein structures, as a way of measuring the rigidity/flexibility of every amino acid in the chain, and the effect of such rigidity on the disorder propensity. The features are estimated from empirical Ramachandran Plots obtained using the Protein Geometry Database. The proposed features are used in conjunction with well-established features in the state of the art for disorder prediction. The classification is performed using two different strategies: one based on conventional supervised methods and the other one based on structural learning. The performance is evaluated in terms of AUC (Area Under the ROC Curve), and three suitable performance metrics for unbalanced classification problems. The results show that the proposed scheme using conventional supervised methods is able to achieve results similar than well-known alignment free methods for disorder prediction. Moreover, the scheme based on structural learning outperforms the results obtained for all the methods evaluated, including three alignment-based methods.

[1]  Sonia Longhi,et al.  BMC Genomics , 2003 .

[2]  P. Tompa,et al.  Computational approaches for inferring the functions of intrinsically disordered proteins , 2015, Front. Mol. Biosci..

[3]  Roland Eils,et al.  Using Bayesian multinomial classifier to predict whether a given protein sequence is intrinsically disordered. , 2008, Journal of theoretical biology.

[4]  Li Shen,et al.  Erratum to: Improving protein order-disorder classification using charge-hydropathy plots , 2015, BMC Bioinformatics.

[5]  Lukasz Kurgan,et al.  Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life , 2014, Cellular and Molecular Life Sciences.

[6]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[7]  V. Uversky Unusual biophysics of intrinsically disordered proteins. , 2013, Biochimica et biophysica acta.

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[10]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[11]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[12]  Lukasz A. Kurgan,et al.  D2P2: database of disordered protein predictions , 2012, Nucleic Acids Res..

[13]  Yuedong Yang,et al.  Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. , 2009, Structure.

[14]  Zoran Obradovic,et al.  DisProt: the Database of Disordered Proteins , 2006, Nucleic Acids Res..

[15]  Silvio C. E. Tosatto,et al.  MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins , 2014, Nucleic Acids Res..

[16]  Peter B. Krenesky,et al.  Protein Geometry Database: a flexible engine to explore backbone conformations and their relationships to covalent geometry , 2009, Nucleic Acids Res..

[17]  Vladimir N. Uversky,et al.  Order, Disorder, and Everything in Between , 2016, Molecules.

[18]  Mathura S Venkatarajan,et al.  New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties , 2001 .

[19]  Roland L. Dunbrack,et al.  PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. , 2010, Biochimica et biophysica acta.

[20]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[21]  Fernanda L. Sirota,et al.  Parameterization of disorder predictors for large-scale applications requiring high specificity by using an extended benchmark dataset , 2010, BMC Genomics.

[22]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[23]  Jianlin Cheng,et al.  A comprehensive overview of computational protein disorder prediction methods. , 2012, Molecular bioSystems.

[24]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[25]  A Keith Dunker,et al.  TOP-IDP-scale: a new amino acid scale measuring propensity for intrinsic disorder. , 2008, Protein and peptide letters.

[26]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[27]  Silvio C. E. Tosatto,et al.  ESpritz: accurate and fast prediction of protein disorder , 2012, Bioinform..

[28]  Christopher J. Oldfield,et al.  The unfoldomics decade: an update on intrinsically disordered proteins , 2008, BMC Genomics.

[29]  P. Biswas,et al.  Conformational Entropy of Intrinsically Disordered Proteins from Amino Acid Triads , 2015, Scientific Reports.

[30]  Rohit V Pappu,et al.  Relating sequence encoded information to form and function of intrinsically disordered proteins. , 2015, Current opinion in structural biology.

[31]  Christopher J. Oldfield,et al.  Intrinsically disordered proteins in human diseases: introducing the D2 concept. , 2008, Annual review of biophysics.

[32]  P. Karplus,et al.  A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins , 2010, Biomolecular concepts.

[33]  A Keith Dunker,et al.  SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method , 2012, Journal of biomolecular structure & dynamics.