Accuracy of protein-level disorder predictions

Experimental annotations of intrinsic disorder are available for 0.1% of 147 000 000 of currently sequenced proteins. Over 60 sequence-based disorder predictors were developed to help bridge this gap. Current benchmarks of these methods assess predictive performance on datasets of proteins; however, predictions are often interpreted for individual proteins. We demonstrate that the protein-level predictive performance varies substantially from the dataset-level benchmarks. Thus, we perform first-of-its-kind protein-level assessment for 13 popular disorder predictors using 6200 disorder-annotated proteins. We show that the protein-level distributions are substantially skewed toward high predictive quality while having long tails of poor predictions. Consequently, between 57% and 75% proteins secure higher predictive performance than the currently used dataset-level assessment suggests, but as many as 30% of proteins that are located in the long tails suffer low predictive performance. These proteins typically have relatively high amounts of disorder, in contrast to the mostly structured proteins that are predicted accurately by all 13 methods. Interestingly, each predictor provides the most accurate results for some number of proteins, while the best-performing at the dataset-level method is in fact the best for only about 30% of proteins. Moreover, the majority of proteins are predicted more accurately than the dataset-level performance of the most accurate tool by at least four disorder predictors. While these results suggests that disorder predictors outperform their current benchmark performance for the majority of proteins and that they complement each other, novel tools that accurately identify the hard-to-predict proteins and that make accurate predictions for these proteins are needed.

[1]  Zsuzsanna Dosztányi,et al.  Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins , 2010, Briefings Bioinform..

[2]  V. Uversky Natively unfolded proteins: A point where biology waits for physics , 2002, Protein science : a publication of the Protein Society.

[3]  Lukasz A. Kurgan,et al.  On the Complementarity of the Consensus-Based Disorder Prediction , 2011, Pacific Symposium on Biocomputing.

[4]  Chen Wang,et al.  Quality assessment for the putative intrinsic disorder in proteins , 2018, Bioinform..

[5]  M. Madan Babu,et al.  The contribution of intrinsically disordered regions to protein function, cellular complexity, and human disease , 2016, Biochemical Society transactions.

[6]  Silvio C. E. Tosatto,et al.  CSpritz: accurate prediction of protein disorder segments with annotation for homology, secondary structure and linear motifs , 2011, Nucleic Acids Res..

[7]  V. Uversky,et al.  Intrinsically Disordered Proteins: The Dark Horse of the Dark Proteome , 2018, Proteomics.

[8]  Vladimir N Uversky,et al.  What does it mean to be natively unfolded? , 2002, European journal of biochemistry.

[9]  Lukasz A. Kurgan,et al.  DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences , 2016, Bioinform..

[10]  Lukasz Kurgan,et al.  High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder , 2015, Nucleic acids research.

[11]  Anna Gambin,et al.  Inferring serum proteolytic activity from LC-MS/MS data , 2011, BMC Bioinformatics.

[12]  Roland L. Dunbrack,et al.  Assessment of disorder predictions in CASP6 , 2005, Proteins.

[13]  Daniel J. Rigden,et al.  From Protein Structure to Function with Bioinformatics , 2009 .

[14]  T. Gibson,et al.  Protein disorder prediction: implications for structural proteomics. , 2003, Structure.

[15]  Kengo Kinoshita,et al.  Prediction of disordered regions in proteins based on the meta approach , 2008, Bioinform..

[16]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[17]  Lukasz Kurgan,et al.  Protein intrinsic disorder as a flexible armor and a weapon of HIV-1 , 2011, Cellular and Molecular Life Sciences.

[18]  Janusz M. Bujnicki,et al.  MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins , 2012, BMC Bioinformatics.

[19]  Lukasz Kurgan,et al.  High‐throughput prediction of disordered moonlighting regions in protein sequences , 2018, Proteins.

[20]  Zsuzsanna Dosztányi,et al.  Bioinformatics Approaches to the Structure and Function of Intrinsically Disordered Proteins , 2017 .

[21]  Pierre Baldi,et al.  Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data , 2005, Data Mining and Knowledge Discovery.

[22]  Lukasz Kurgan,et al.  Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions , 2017, Cellular and Molecular Life Sciences.

[23]  Christopher J. Oldfield,et al.  Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. , 2007, Journal of proteome research.

[24]  Lukasz Kurgan,et al.  Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life , 2014, Cellular and Molecular Life Sciences.

[25]  Vijay S Pande,et al.  Finding Our Way in the Dark Proteome. , 2016, Journal of the American Chemical Society.

[26]  D. Perlstein,et al.  Defining the domains of Cia2 required for its essential function in vivo and in vitro. , 2017, Metallomics : integrated biometal science.

[27]  Lukasz Kurgan,et al.  Computational Prediction of Intrinsic Disorder in Proteins , 2017, Current protocols in protein science.

[28]  Jörg Gsponer,et al.  MoRFchibi SYSTEM: software tools for the identification of MoRFs in protein sequences , 2016, Nucleic Acids Res..

[29]  Zsuzsanna Dosztányi,et al.  IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding , 2018, Nucleic Acids Res..

[30]  Robert B. Russell,et al.  GlobPlot: exploring protein sequences for globularity and disorder , 2003, Nucleic Acids Res..

[31]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[32]  Anna Tramontano,et al.  Evaluation of disorder predictions in CASP9 , 2011, Proteins.

[33]  J. Evans,et al.  Insect Cell Glycosylation and Its Impact on the Functionality of a Recombinant Intracrystalline Nacre Protein, AP24. , 2016, Biochemistry.

[34]  Vladimir N Uversky,et al.  Resolving the ambiguity: Making sense of intrinsic disorder when PDB structures disagree , 2016, Protein science : a publication of the Protein Society.

[35]  Sébastien Theil,et al.  Protein intrinsic disorder within the Potyvirus genus: from proteome-wide analysis to functional annotation. , 2016, Molecular bioSystems.

[36]  Lukasz Kurgan,et al.  More than just tails: intrinsic disorder in histone proteins. , 2012, Molecular bioSystems.

[37]  Burkhard Rost,et al.  NORSp: predictions of long regions without regular secondary structure , 2003, Nucleic Acids Res..

[38]  Edward E. Pryor,et al.  A critical evaluation of in silico methods for detection of membrane protein intrinsic disorder. , 2014, Biophysical journal.

[39]  Jinku Bao,et al.  An Overview of Predictors for Intrinsically Disordered Proteins over 2010–2014 , 2015, International journal of molecular sciences.

[40]  A Keith Dunker,et al.  Molecular recognition features (MoRFs) in three domains of life. , 2016, Molecular bioSystems.

[41]  A Keith Dunker,et al.  SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method , 2012, Journal of biomolecular structure & dynamics.

[42]  L. Iakoucheva,et al.  Intrinsic Disorder and Protein Function , 2002 .

[43]  Lukasz A. Kurgan,et al.  D2P2: database of disordered protein predictions , 2012, Nucleic Acids Res..

[44]  John Moult,et al.  Evaluation of disorder predictions in CASP5 , 2003, Proteins.

[45]  Christopher J. Oldfield,et al.  Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling , 2005, Journal of molecular recognition : JMR.

[46]  Lukasz Kurgan,et al.  A creature with a hundred waggly tails: intrinsically disordered proteins in the ribosome , 2013, Cellular and Molecular Life Sciences.

[47]  Silvio C. E. Tosatto,et al.  MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins , 2017, Nucleic Acids Res..

[48]  Zsuzsanna Dosztányi,et al.  ANCHOR: web server for predicting protein binding regions in disordered proteins , 2009, Bioinform..

[49]  A. Dunker,et al.  Understanding protein non-folding. , 2010, Biochimica et biophysica acta.

[50]  Lukasz Kurgan,et al.  Prediction of Disordered RNA, DNA, and Protein Binding Regions Using DisoRDPbind. , 2017, Methods in molecular biology.

[51]  Avner Schlessinger,et al.  Improved Disorder Prediction by Combination of Orthogonal Approaches , 2009, PloS one.

[52]  Yaoqi Zhou,et al.  Improving protein disorder prediction by deep bidirectional long short‐term memory recurrent neural networks , 2016, Bioinform..

[53]  Lukasz Kurgan,et al.  Prediction of intrinsic disorder in proteins using MFDp2. , 2014, Methods in molecular biology.

[54]  Peter Tompa,et al.  The role of structural disorder in cell cycle regulation, related clinical proteomics, disease development and drug targeting , 2015, Expert review of proteomics.

[55]  Motonori Ota,et al.  IDEAL in 2014 illustrates interaction networks composed of intrinsically disordered proteins and their binding partners , 2013, Nucleic Acids Res..

[56]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[57]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[58]  Marc S. Cortese,et al.  Analysis of molecular recognition features (MoRFs). , 2006, Journal of molecular biology.

[59]  Lukasz A. Kurgan,et al.  Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources , 2010, Bioinform..

[60]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[61]  Lukasz Kurgan,et al.  Comprehensive comparative assessment of in-silico predictors of disordered regions. , 2012, Current protein & peptide science.

[62]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[63]  Lukasz Kurgan,et al.  Unstructural biology of the dengue virus proteins , 2015, The FEBS journal.

[64]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[65]  A. Dunker,et al.  Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life , 2012, Journal of biomolecular structure & dynamics.

[66]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[67]  Vladimir N Uversky,et al.  The triple power of D³: protein intrinsic disorder in degenerative diseases. , 2014, Frontiers in bioscience.

[68]  Lukasz A. Kurgan,et al.  MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins , 2012, Bioinform..

[69]  Peter Tompa,et al.  Functional Advantages of Conserved Intrinsic Disorder in RNA-Binding Proteins , 2015, PloS one.

[70]  M. Bolognesi,et al.  Function and Structure of Inherently Disordered Proteins This Review Comes from a Themed Issue on Proteins Edited Prediction of Non-folding Proteins and Regions Frequency of Disordered Regions Protein Evolution Partitioning Unstructured Proteins and Regions into Groups Involvement of Inherently Diso , 2022 .

[71]  Kengo Kinoshita,et al.  PrDOS: prediction of disordered protein regions from amino acid sequence , 2007, Nucleic Acids Res..

[72]  Sonia Longhi,et al.  DisProt 7.0: a major update of the database of disordered proteins , 2016, Nucleic Acids Res..

[73]  Vladimir N Uversky,et al.  Introduction to intrinsically disordered proteins (IDPs). , 2014, Chemical reviews.

[74]  Bernard F. Buxton,et al.  The DISOPRED server for the prediction of protein disorder , 2004, Bioinform..

[75]  P. Radivojac,et al.  PROTEINS: Structure, Function, and Bioinformatics Suppl 7:176–182 (2005) Exploiting Heterogeneous Sequence Properties Improves Prediction of Protein Disorder , 2022 .

[76]  M. Zweckstetter,et al.  Targeting intrinsically disordered proteins in rational drug discovery , 2016, Expert opinion on drug discovery.

[77]  Anna Tramontano,et al.  Assessment of protein disorder region predictions in CASP10 , 2014, Proteins.

[78]  David T. Jones,et al.  Prediction of disordered regions in proteins from position specific score matrices , 2003, Proteins.

[79]  Sonia Longhi,et al.  How disordered is my protein and what is its disorder for? A guide through the “dark side” of the protein universe , 2016, Intrinsically disordered proteins.

[80]  Zheng Rong Yang,et al.  RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins , 2005, Bioinform..

[81]  Vladimir N Uversky,et al.  Intrinsic disorder in proteins involved in the innate antiviral immunity: another flexible side of a molecular arms race. , 2014, Journal of molecular biology.

[82]  Sonia Longhi,et al.  What’s in a name? Why these proteins are intrinsically disordered , 2013, Intrinsically disordered proteins.

[83]  A. Varma,et al.  Functional assessment of intrinsic disorder central domains of BRCA1 , 2015, Journal of biomolecular structure & dynamics.

[84]  Lukasz Kurgan,et al.  Accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus , 2014, Journal of biomolecular structure & dynamics.

[85]  Xiaolong Wang,et al.  A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction , 2019, Briefings Bioinform..

[86]  P. Tompa,et al.  Introducing protein intrinsic disorder. , 2014, Chemical reviews.

[87]  Lukasz Kurgan,et al.  Disordered Proteinaceous Machines , 2014, Chemical reviews.

[88]  Jianlin Cheng,et al.  A comprehensive overview of computational protein disorder prediction methods. , 2012, Molecular bioSystems.

[89]  Vladimir N Uversky,et al.  Intrinsically disordered proteins and novel strategies for drug discovery , 2012, Expert opinion on drug discovery.

[90]  Silvio C. E. Tosatto,et al.  ESpritz: accurate and fast prediction of protein disorder , 2012, Bioinform..

[91]  Gaetano T Montelione,et al.  DisMeta: a meta server for construct design and optimization. , 2014, Methods in molecular biology.

[92]  Marc S. Cortese,et al.  Rational drug design via intrinsically disordered protein. , 2006, Trends in biotechnology.

[93]  V. Uversky,et al.  Why are “natively unfolded” proteins unstructured under physiologic conditions? , 2000, Proteins.

[94]  Jaime Prilusky,et al.  Assessment of disorder predictions in CASP8 , 2009, Proteins.

[95]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[96]  A. Keith Dunker,et al.  Intrinsic Disorder in the Protein Data Bank , 2007, Journal of biomolecular structure & dynamics.

[97]  Christopher J. Oldfield,et al.  Intrinsic disorder in transcription factors. , 2006, Biochemistry.

[98]  Yaoqi Zhou,et al.  Accurate Single-Sequence Prediction of Protein Intrinsic Disorder by an Ensemble of Deep Recurrent and Convolutional Architectures , 2018, J. Chem. Inf. Model..

[99]  R. Pappu,et al.  Improvements to the ABSINTH Force Field for Proteins Based on Experimentally Derived Amino Acid Specific Backbone Conformational Statistics. , 2019, Journal of chemical theory and computation.

[100]  Lukasz Kurgan,et al.  MFDp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles. , 2013, Intrinsically disordered proteins.

[101]  Lukasz Kurgan,et al.  Disordered nucleiome: Abundance of intrinsic disorder in the DNA‐ and RNA‐binding proteins in 1121 species from Eukaryota, Bacteria and Archaea , 2016, Proteomics.

[102]  Genki Terashi,et al.  Modeling disordered protein interactions from biophysical principles , 2017, PLoS Comput. Biol..

[103]  Birthe B. Kragelund,et al.  Functions of intrinsic disorder in transmembrane proteins , 2017, Cellular and Molecular Life Sciences.

[104]  Torsten Schwede,et al.  Assessment of disorder predictions in CASP7 , 2007, Proteins.

[105]  Christopher J. Oldfield,et al.  Intrinsically disordered proteins in human diseases: introducing the D2 concept. , 2008, Annual review of biophysics.

[106]  Vladimir N Uversky,et al.  Pathological unfoldomics of uncontrolled chaos: intrinsically disordered proteins and human diseases. , 2014, Chemical reviews.

[107]  David T. Jones,et al.  Getting the most from PSI-BLAST. , 2002, Trends in biochemical sciences.

[108]  B. L. de Groot,et al.  CHARMM36m: an improved force field for folded and intrinsically disordered proteins , 2016, Nature Methods.

[109]  Silvio C. E. Tosatto,et al.  MobiDB‐lite: fast and highly specific consensus prediction of intrinsic disorder in proteins , 2017, Bioinform..

[110]  Lukasz Kurgan,et al.  On the intrinsic disorder status of the major players in programmed cell death pathways , 2013, F1000Research.

[111]  Silvio C. E. Tosatto,et al.  A comprehensive assessment of long intrinsic protein disorder from the DisProt database , 2018, Bioinform..

[112]  Jaime Prilusky,et al.  FoldIndex copyright: a simple tool to predict whether a given protein sequence is intrinsically unfolded , 2005, Bioinform..

[113]  Lukasz Kurgan,et al.  Autophagy-related intrinsically disordered proteins in intra-nuclear compartments. , 2016, Molecular bioSystems.

[114]  Lukasz Kurgan,et al.  Genome‐scale prediction of proteins with long intrinsically disordered regions , 2014, Proteins.

[115]  J. Evans,et al.  A Model Sea Urchin Spicule Matrix Protein Self-Associates To Form Mineral-Modifying Protein Hydrogels. , 2016, Biochemistry.

[116]  Roland L. Dunbrack,et al.  PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. , 2010, Biochimica et biophysica acta.

[117]  Lukasz Kurgan,et al.  DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields , 2015, International journal of molecular sciences.

[118]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[119]  Patrick T. Dolan,et al.  Intrinsic disorder mediates hepatitis C virus core–host cell protein interactions , 2015, Protein science : a publication of the Protein Society.

[120]  Silvio C. E. Tosatto,et al.  Comprehensive large-scale assessment of intrinsic protein disorder , 2015, Bioinform..

[121]  Jiangning Song,et al.  Taxonomic Landscape of the Dark Proteomes: Whole‐Proteome Scale Interplay Between Structural Darkness, Intrinsic Disorder, and Crystallization Propensity , 2018, Proteomics.

[122]  Lukasz Kurgan,et al.  The intrinsic disorder status of the human hepatitis C virus proteome. , 2014, Molecular bioSystems.

[123]  A Keith Dunker,et al.  Drugs for 'protein clouds': targeting intrinsically disordered transcription factors. , 2010, Current opinion in pharmacology.

[124]  Jianlin Cheng,et al.  Protein disorder prediction at multiple levels of sensitivity and specificity , 2008, BMC Genomics.

[125]  Lukasz Kurgan,et al.  Resilience of death: intrinsic disorder in proteins involved in the programmed cell death , 2013, Cell Death and Differentiation.

[126]  Kay Diederichs,et al.  Bacterial flagellar capping proteins adopt diverse oligomeric states , 2016, eLife.

[127]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[128]  Lukasz Kurgan,et al.  Compartmentalization and Functionality of Nuclear Disorder: Intrinsic Disorder and Protein-Protein Interactions in Intra-Nuclear Compartments , 2015, International journal of molecular sciences.

[129]  V. Uversky Intrinsically Disordered Proteins , 2014 .