Improving protein order-disorder classification using charge-hydropathy plots

BackgroundThe earliest whole protein order/disorder predictor (Uversky et al., Proteins, 41: 415-427 (2000)), herein called the charge-hydropathy (C-H) plot, was originally developed using the Kyte-Doolittle (1982) hydropathy scale (Kyte & Doolittle., J. Mol. Biol, 157: 105-132(1982)). Here the goal is to determine whether the performance of the C-H plot in separating structured and disordered proteins can be improved by using an alternative hydropathy scale.ResultsUsing the performance of the CH-plot as the metric, we compared 19 alternative hydropathy scales, with the finding that the Guy (1985) hydropathy scale (Guy, Biophys. J, 47:61-70(1985)) was the best of the tested hydropathy scales for separating large collections structured proteins and intrinsically disordered proteins (IDPs) on the C-H plot. Next, we developed a new scale, named IDP-Hydropathy, which further improves the discrimination between structured proteins and IDPs. Applying the C-H plot to a dataset containing 109 IDPs and 563 non-homologous fully structured proteins, the Kyte-Doolittle (1982) hydropathy scale, the Guy (1985) hydropathy scale, and the IDP-Hydropathy scale gave balanced two-state classification accuracies of 79%, 84%, and 90%, respectively, indicating a very substantial overall improvement is obtained by using different hydropathy scales. A correlation study shows that IDP-Hydropathy is strongly correlated with other hydropathy scales, thus suggesting that IDP-Hydropathy probably has only minor contributions from amino acid properties other than hydropathy.ConclusionWe suggest that IDP-Hydropathy would likely be the best scale to use for any type of algorithm developed to predict protein disorder.

[1]  P. Radivojac,et al.  Protein flexibility and intrinsic disorder , 2004, Protein science : a publication of the Protein Society.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Jianlin Cheng,et al.  A comprehensive overview of computational protein disorder prediction methods. , 2012, Molecular bioSystems.

[4]  Christopher J. Oldfield,et al.  Exploring the binding diversity of intrinsically disordered proteins involved in one‐to‐many binding , 2013, Protein science : a publication of the Protein Society.

[5]  M. Y. Lobanov,et al.  To be folded or to be unfolded? , 2004, Protein science : a publication of the Protein Society.

[6]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[7]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[8]  H. Bull,et al.  Surface tension of amino acid solutions: a hydrophobicity scale of the amino acid residues. , 1974, Archives of biochemistry and biophysics.

[9]  D. Eisenberg,et al.  Analysis of membrane and surface protein sequences with the hydrophobic moment plot. , 1984, Journal of molecular biology.

[10]  D. Eisenberg,et al.  Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. , 1983, Journal of molecular biology.

[11]  A K Dunker,et al.  Protein disorder and the evolution of molecular recognition: theory, predictions and observations. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[12]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[13]  Fei Huang,et al.  Subclassifying Disordered Proteins by the CH-CDF Plot Method , 2011, Pacific Symposium on Biocomputing.

[14]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[15]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[16]  P M Cullis,et al.  Affinities of amino acid side chains for solvent water. , 1981, Biochemistry.

[17]  Zoran Obradovic,et al.  Optimizing Long Intrinsic Disorder Predictors with Protein Evolutionary Information , 2005, J. Bioinform. Comput. Biol..

[18]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[19]  Christopher J. Oldfield,et al.  Functional anthology of intrinsic disorder. 3. Ligands, post-translational modifications, and diseases associated with intrinsically disordered proteins. , 2007, Journal of proteome research.

[20]  A Keith Dunker,et al.  TOP-IDP-scale: a new amino acid scale measuring propensity for intrinsic disorder. , 2008, Protein and peptide letters.

[21]  A Keith Dunker,et al.  Characterization of molecular recognition features, MoRFs, and their binding partners. , 2007, Journal of proteome research.

[22]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[23]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[24]  J. Janin,et al.  Surface and inside volumes in globular proteins , 1979, Nature.

[25]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[26]  Christopher J. Oldfield,et al.  Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. , 2007, Journal of proteome research.

[27]  V. Uversky,et al.  Why are “natively unfolded” proteins unstructured under physiologic conditions? , 2000, Proteins.

[28]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[29]  Zoran Obradovic,et al.  DisProt: a database of protein disorder , 2005, Bioinform..

[30]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[31]  Z. Obradovic,et al.  Identification and functions of usefully disordered proteins. , 2002, Advances in protein chemistry.

[32]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[34]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[35]  Marc S. Cortese,et al.  Analysis of molecular recognition features (MoRFs). , 2006, Journal of molecular biology.

[36]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[37]  C. Chothia The nature of the accessible and buried surfaces in proteins. , 1976, Journal of molecular biology.

[38]  A. Leo,et al.  Extension of the fragment method to calculate amino acid zwitterion and side chain partition coefficients , 1987, Proteins.

[39]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[40]  J. Beckmann,et al.  FoldIndex©: a simple tool to predict whether a given protein sequence is intrinsically unfolded , 2005 .

[41]  A Keith Dunker,et al.  SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method , 2012, Journal of biomolecular structure & dynamics.

[42]  Zoran Obradovic,et al.  DisProt: the Database of Disordered Proteins , 2006, Nucleic Acids Res..

[43]  K. Pearson Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia , 1896 .

[44]  J. Amend,et al.  Solubilities of the common L-α-amino acids as a function of temperature and solution pH , 1997 .

[45]  Ruurd van der Zee,et al.  Prediction of sequential antigenic regions in proteins , 1985, FEBS letters.

[46]  Marc S. Cortese,et al.  Coupled folding and binding with α-helix-forming molecular recognition elements , 2005 .

[47]  A Keith Dunker,et al.  Intrinsic disorder and protein function. , 2002, Biochemistry.

[48]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[49]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[50]  Anna Tramontano,et al.  Evaluation of disorder predictions in CASP9 , 2011, Proteins.

[51]  A. Keith Dunker,et al.  A functionally required unfoldome from the plant kingdom: intrinsically disordered N-terminal domains of GRAS proteins are involved in molecular recognition during plant development , 2011, Plant Molecular Biology.

[52]  D. Altman,et al.  Statistics Notes: Diagnostic tests 2: predictive values , 1994, BMJ.

[53]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Lisa N Kinch,et al.  Optimization of linear disorder predictors yields tight association between crystallographic disorder and hydrophobicity , 2007, Protein science : a publication of the Protein Society.

[55]  Lukasz Kurgan,et al.  Comprehensive comparative assessment of in-silico predictors of disordered regions. , 2012, Current protein & peptide science.

[56]  Jorma Laurikkala,et al.  Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.

[57]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[58]  Thomas F. Heston,et al.  Standardizing predictive values in diagnostic imaging research , 2011, Journal of magnetic resonance imaging : JMRI.

[59]  Jaime Prilusky,et al.  Assessment of disorder predictions in CASP8 , 2009, Proteins.

[60]  M. Kanehisa,et al.  Cluster analysis of amino acid indices for prediction of protein structure and function. , 1988, Protein engineering.

[61]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[62]  C. Tanford,et al.  The solubility of amino acids and two glycine peptides in aqueous ethanol and dioxane solutions. Establishment of a hydrophobicity scale. , 1971, The Journal of biological chemistry.

[63]  R. J. Williams The conformational mobility of proteins and its functional significance. , 1978, Biochemical Society transactions.

[64]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[65]  P Argos,et al.  A conformational preference parameter to predict helices in integral membrane proteins. , 1986, Biochimica et biophysica acta.

[66]  A Keith Dunker,et al.  Functional anthology of intrinsic disorder. 2. Cellular components, domains, technical terms, developmental processes, and coding sequence diversities correlated with long disordered regions. , 2007, Journal of proteome research.

[67]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.

[68]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[69]  Vassilios Ioannidis,et al.  ExPASy: SIB bioinformatics resource portal , 2012, Nucleic Acids Res..

[70]  R. Bharat Rao,et al.  Data mining for improved cardiac care , 2006, SKDD.

[71]  A Keith Dunker,et al.  CDF it all: Consensus prediction of intrinsically disordered proteins based on various cumulative distribution functions , 2009, FEBS letters.

[72]  Roland L. Dunbrack,et al.  PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. , 2010, Biochimica et biophysica acta.

[73]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[74]  Yuan Qi,et al.  Identifying Neuroimaging and Proteomic Biomarkers for MCI and AD via the Elastic Net , 2011, MBIA.

[75]  L. Iakoucheva,et al.  Intrinsic disorder in cell-signaling and cancer-associated proteins. , 2002, Journal of molecular biology.

[76]  Bernard F. Buxton,et al.  The DISOPRED server for the prediction of protein disorder , 2004, Bioinform..

[77]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[78]  Li Shen,et al.  Erratum to: Improving protein order-disorder classification using charge-hydropathy plots , 2015, BMC Bioinformatics.

[79]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[80]  A. Berger,et al.  Poly-L-proline , 1954 .

[81]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[82]  P. Ponnuswamy,et al.  Hydrophobic character of amino acid residues in globular proteins , 1978, Nature.

[83]  H. Dyson,et al.  Coupling of folding and binding for unstructured proteins. , 2002, Current opinion in structural biology.

[84]  Vladimir Vacic,et al.  Composition Profiler: a tool for discovery and visualization of amino acid composition differences , 2007, BMC Bioinformatics.

[85]  J. Lanke,et al.  The predictive value of microbiologic diagnostic tests if asymptomatic carriers are present , 2002, Statistics in medicine.

[86]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[87]  D. Mould,et al.  Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications. , 1991, Analytical biochemistry.

[88]  H. Guy Amino acid side-chain partition energies and distribution of residues in soluble proteins. , 1985, Biophysical journal.

[89]  M A Roseman,et al.  Hydrophilicity of polar amino acid side-chains is markedly reduced by flanking peptide bonds. , 1988, Journal of molecular biology.