Uncertainty analysis in protein disorder prediction.

UNLABELLED A grand challenge in the proteomics and structural genomics era is the prediction of protein structure, including identification of those proteins that are partially or wholly unstructured. A number of predictors for identification of intrinsically disordered proteins (IDPs) have been developed over the last decade, but none can be taken as a fully reliable on its own. Using a single model for prediction is typically inadequate because prediction based on only the most accurate model ignores model uncertainty. In this paper, we present an empirical method to specify and measure uncertainty associated with disorder predictions. In particular, we analyze the uncertainty in the reference model itself and the uncertainty in data. This is achieved by training a set of models and developing several meta predictors on top of them. The best meta predictor achieved comparable or better results than any other single model, suggesting that incorporating different aspects of protein disorder prediction is important for the disorder prediction task. In addition, the best meta-predictor had more balanced sensitivity and specificity than any individual model. We also assessed the effects of changes in disorder prediction as a function of changes in the protein sequence. For collections of homologous sequences, we found that mutations caused many of the predicted disordered residues to be flipped to be predicted as ordered residues, while the reverse was observed much less frequently. These results suggest that disorder tendencies are more sensitive to allowed mutations than structure tendencies and the conservation of disorder is indeed less stable than conservation of structure. AVAILABILITY five meta-predictors and four single models developed for this study will be publicly freely accessible for non-commercial use.

[1]  A. Keith Dunker,et al.  Overlapping Genes Produce Proteins with Unusual Sequence Properties and Offer Insight into De Novo Protein Creation , 2009, Journal of Virology.

[2]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[3]  Lixiao Wang,et al.  OnD-CRF: predicting order and disorder in proteins conditional random fields , 2008, Bioinform..

[4]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[5]  Anna Tramontano,et al.  Evaluation of disorder predictions in CASP9 , 2011, Proteins.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[8]  István Simon,et al.  Assessing Conservation of Disordered Regions in Proteins , 2008 .

[9]  A. Dunker,et al.  Evolution and disorder. , 2011, Current opinion in structural biology.

[10]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[11]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[12]  A. Dunker,et al.  Abundance of intrinsic disorder in protein associated with cardiovascular disease. , 2006, Biochemistry.

[13]  Sonia Longhi,et al.  A practical overview of protein disorder prediction methods , 2006, Proteins.

[14]  Zoran Obradovic,et al.  DisProt: the Database of Disordered Proteins , 2006, Nucleic Acids Res..

[15]  S. Vucetic,et al.  Flavors of protein disorder , 2003, Proteins.

[16]  A Keith Dunker,et al.  Unfoldomics of human genetic diseases: illustrative examples of ordered and intrinsically disordered members of the human diseasome. , 2009, Protein and peptide letters.

[17]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[18]  Christopher J. Oldfield,et al.  Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. , 2007, Journal of proteome research.

[19]  Xin Deng,et al.  PreDisorder: ab initio sequence-based prediction of protein disordered regions , 2009, BMC Bioinformatics.

[20]  Zsuzsanna Dosztányi,et al.  Prediction of protein disorder. , 2008, Methods in molecular biology.

[21]  Thomas I. Milac,et al.  Disorder targets misorder in nuclear quality control degradation: a disordered ubiquitin ligase directly recognizes its misfolded substrates. , 2011, Molecular cell.

[22]  P Argos,et al.  Evolution of protein cores. Constraints in point mutations as observed in globin tertiary structures. , 1990, Journal of molecular biology.

[23]  Jaime Prilusky,et al.  Assessment of disorder predictions in CASP8 , 2009, Proteins.

[24]  Jack Y. Yang,et al.  Predicting protein disorder by analyzing amino acid sequence , 2008, BMC Genomics.

[25]  T. Gibson,et al.  Protein disorder prediction: implications for structural proteomics. , 2003, Structure.

[26]  Christopher J. Oldfield,et al.  Functional anthology of intrinsic disorder. 3. Ligands, post-translational modifications, and diseases associated with intrinsically disordered proteins. , 2007, Journal of proteome research.

[27]  Dong Xu,et al.  Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites* , 2010, Molecular & Cellular Proteomics.

[28]  L. Iakoucheva,et al.  Intrinsic Disorder and Protein Function , 2002 .

[29]  Lixiao Wang,et al.  OnD-CRF: prediciting order and disorder in proteins conditional random fields , 2008, Bioinform..

[30]  A Keith Dunker,et al.  Unfoldomics of human diseases: linking protein intrinsic disorder with diseases , 2009, BMC Genomics.

[31]  Robert B. Russell,et al.  GlobPlot: exploring protein sequences for globularity and disorder , 2003, Nucleic Acids Res..

[32]  Sven Griep,et al.  PDBselect 1992–2009 and PDBfilter-select , 2009, Nucleic Acids Res..

[33]  Christian Schaefer,et al.  Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be , 2010, Bioinform..

[34]  David T. Jones,et al.  Computational Resources for the Prediction and Analysis of Native Disorder in Proteins , 2010, Proteome Bioinformatics.

[35]  Kengo Kinoshita,et al.  Prediction of disordered regions in proteins based on the meta approach , 2008, Bioinform..

[36]  S. Benner Patterns of divergence in homologous proteins as indicators of tertiary and quaternary structure. , 1989, Advances in enzyme regulation.

[37]  Satoshi Fukuchi,et al.  Intrinsically disordered loops inserted into the structural domains of human proteins. , 2006, Journal of molecular biology.

[38]  Christopher J. Oldfield,et al.  Intrinsically disordered proteins in human diseases: introducing the D2 concept. , 2008, Annual review of biophysics.

[39]  Sonia Longhi,et al.  BMC Genomics , 2003 .

[40]  P. Radivojac,et al.  PROTEINS: Structure, Function, and Bioinformatics Suppl 7:176–182 (2005) Exploiting Heterogeneous Sequence Properties Improves Prediction of Protein Disorder , 2022 .

[41]  Avner Schlessinger,et al.  Improved Disorder Prediction by Combination of Orthogonal Approaches , 2009, PloS one.

[42]  A Keith Dunker,et al.  Protein disorder in the human diseasome: unfoldomics of human genetic diseases , 2009, BMC Genomics.

[43]  Andrzej Kloczkowski,et al.  Packing regularities in biological structures relate to their dynamics. , 2007, Methods in molecular biology.

[44]  Roland L. Dunbrack,et al.  PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. , 2010, Biochimica et biophysica acta.

[45]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[46]  Zsuzsanna Dosztányi,et al.  Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins , 2010, Briefings Bioinform..

[47]  L. Iakoucheva,et al.  Intrinsic disorder in cell-signaling and cancer-associated proteins. , 2002, Journal of molecular biology.

[48]  B. Rost,et al.  Protein disorder--a breakthrough invention of evolution? , 2011, Current opinion in structural biology.

[49]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[50]  V. Vacic,et al.  Disease mutations in disordered regions--exception to the rule? , 2012, Molecular bioSystems.

[51]  A Keith Dunker,et al.  Functional anthology of intrinsic disorder. 2. Cellular components, domains, technical terms, developmental processes, and coding sequence diversities correlated with long disordered regions. , 2007, Journal of proteome research.