A comprehensive assessment of long intrinsic protein disorder from the DisProt database

Motivation Intrinsic disorder (ID), i.e. the lack of a unique folded conformation at physiological conditions, is a common feature for many proteins, which requires specialized biochemical experiments that are not high‐throughput. Missing X‐ray residues from the PDB have been widely used as a proxy for ID when developing computational methods. This may lead to a systematic bias, where predictors deviate from biologically relevant ID. Large benchmarking sets on experimentally validated ID are scarce. Recently, the DisProt database has been renewed and expanded to include manually curated ID annotations for several hundred new proteins. This provides a large benchmark set which has not yet been used for training ID predictors. Results Here, we describe the first systematic benchmarking of ID predictors on the new DisProt dataset. In contrast to previous assessments based on missing X‐ray data, this dataset contains mostly long ID regions and a significant amount of fully ID proteins. The benchmarking shows that ID predictors work quite well on the new dataset, especially for long ID segments. However, a large fraction of ID still goes virtually undetected and the ranking of methods is different than for PDB data. In particular, many predictors appear to confound ID and regions outside X‐ray structures. This suggests that the ID prediction methods capture different flavors of disorder and can benefit from highly accurate curated examples. Availability and implementation The raw data used for the evaluation are available from URL: http://www.disprot.org/assessment/.

[1]  Piero Fariselli,et al.  The s2D method: simultaneous sequence-based prediction of the statistical populations of ordered and disordered regions in proteins. , 2015, Journal of molecular biology.

[2]  S. Vucetic,et al.  Flavors of protein disorder , 2003, Proteins.

[3]  A Keith Dunker,et al.  Intrinsically disordered proteins and intrinsically disordered protein regions. , 2014, Annual review of biochemistry.

[4]  S. Metallo,et al.  Intrinsically disordered proteins are potential drug targets. , 2010, Current opinion in chemical biology.

[5]  Liam J. McGuffin,et al.  Disorder Prediction Methods, Their Applicability to Different Protein Targets and Their Usefulness for Guiding Experimental Studies , 2015, International journal of molecular sciences.

[6]  Yaoqi Zhou,et al.  Intrinsic Disorder and Semi-disorder Prediction by SPINE-D. , 2016, Methods in molecular biology.

[7]  Michele Vendruscolo,et al.  Druggability of Intrinsically Disordered Proteins. , 2015, Advances in experimental medicine and biology.

[8]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[9]  Sonia Longhi,et al.  DisProt 7.0: a major update of the database of disordered proteins , 2016, Nucleic Acids Res..

[10]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[11]  Tom Lenaerts,et al.  From protein sequence to dynamics and disorder with DynaMine , 2013, Nature Communications.

[12]  Zoran Obradovic,et al.  DisProt: the Database of Disordered Proteins , 2006, Nucleic Acids Res..

[13]  Robert B. Russell,et al.  GlobPlot: exploring protein sequences for globularity and disorder , 2003, Nucleic Acids Res..

[14]  Peter Tompa,et al.  Intrinsically disordered proteins: emerging interaction specialists. , 2015, Current opinion in structural biology.

[15]  Christopher J. Oldfield,et al.  Classification of Intrinsically Disordered Regions and Proteins , 2014, Chemical reviews.

[16]  Silvio C. E. Tosatto,et al.  MOBI: a web server to define and visualize structural mobility in NMR protein ensembles , 2010, Bioinform..

[17]  Lukasz Kurgan,et al.  Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life , 2014, Cellular and Molecular Life Sciences.

[18]  Gary D Bader,et al.  Bringing order to protein disorder through comparative genomics and genetic interactions , 2011, Genome Biology.

[19]  Jaime Prilusky,et al.  FoldIndex copyright: a simple tool to predict whether a given protein sequence is intrinsically unfolded , 2005, Bioinform..

[20]  Lukasz A. Kurgan,et al.  Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources , 2010, Bioinform..

[21]  Christopher J. Oldfield,et al.  Intrinsically disordered proteins in human diseases: introducing the D2 concept. , 2008, Annual review of biophysics.

[22]  A. Dunker,et al.  Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life , 2012, Journal of biomolecular structure & dynamics.

[23]  Kengo Kinoshita,et al.  PrDOS: prediction of disordered protein regions from amino acid sequence , 2007, Nucleic Acids Res..

[24]  Silvio C. E. Tosatto,et al.  CSpritz: accurate prediction of protein disorder segments with annotation for homology, secondary structure and linear motifs , 2011, Nucleic Acids Res..

[25]  A Keith Dunker,et al.  Multiparametric analysis of intrinsically disordered proteins: looking at intrinsic disorder through compound eyes. , 2012, Analytical chemistry.

[26]  Kengo Kinoshita,et al.  Prediction of disordered regions in proteins based on the meta approach , 2008, Bioinform..

[27]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[28]  Silvio C. E. Tosatto,et al.  ESpritz: accurate and fast prediction of protein disorder , 2012, Bioinform..

[29]  Roland L. Dunbrack,et al.  PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. , 2010, Biochimica et biophysica acta.

[30]  P. Tompa,et al.  Introducing protein intrinsic disorder. , 2014, Chemical reviews.

[31]  Janusz M. Bujnicki,et al.  MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins , 2012, BMC Bioinformatics.

[32]  Sonia Longhi,et al.  Structural disorder in viral proteins. , 2014, Chemical reviews.

[33]  J. Beckmann,et al.  FoldIndex©: a simple tool to predict whether a given protein sequence is intrinsically unfolded , 2005 .

[34]  Michail Yu. Lobanov,et al.  Prediction of Amyloidogenic and Disordered Regions in Protein Chains , 2006, PLoS Comput. Biol..

[35]  Silvio C. E. Tosatto,et al.  MobiDB‐lite: fast and highly specific consensus prediction of intrinsic disorder in proteins , 2017, Bioinform..

[36]  Anna Tramontano,et al.  Assessment of protein disorder region predictions in CASP10 , 2014, Proteins.

[37]  Silvio C. E. Tosatto,et al.  Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines , 2006, Nucleic Acids Res..

[38]  Norman E. Davey,et al.  How viruses hijack cell regulation. , 2011, Trends in biochemical sciences.

[39]  Sheng Wang,et al.  AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields , 2016, Bioinform..

[40]  Silvio C. E. Tosatto,et al.  Comprehensive large-scale assessment of intrinsic protein disorder , 2015, Bioinform..

[41]  T. Gibson,et al.  Protein disorder prediction: implications for structural proteomics. , 2003, Structure.

[42]  Zheng Rong Yang,et al.  RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins , 2005, Bioinform..

[43]  Lukasz Kurgan,et al.  Untapped Potential of Disordered Proteins in Current Druggable Human Proteome. , 2016, Current drug targets.

[44]  Lukasz Kurgan,et al.  MFDp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles. , 2013, Intrinsically disordered proteins.

[45]  P. Tompa The interplay between structure and function in intrinsically unstructured proteins , 2005, FEBS letters.

[46]  Abhik Mukhopadhyay,et al.  PDBe: improved accessibility of macromolecular structure data from PDB and EMDB , 2015, Nucleic Acids Res..

[47]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[48]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[49]  B. Rost,et al.  Protein disorder--a breakthrough invention of evolution? , 2011, Current opinion in structural biology.

[50]  P. Tompa,et al.  Structural Disorder in Eukaryotes , 2012, PloS one.