Comprehensive large-scale assessment of intrinsic protein disorder

MOTIVATION Intrinsically disordered regions are key for the function of numerous proteins. Due to the difficulties in experimental disorder characterization, many computational predictors have been developed with various disorder flavors. Their performance is generally measured on small sets mainly from experimentally solved structures, e.g. Protein Data Bank (PDB) chains. MobiDB has only recently started to collect disorder annotations from multiple experimental structures. RESULTS MobiDB annotates disorder for UniProt sequences, allowing us to conduct the first large-scale assessment of fast disorder predictors on 25 833 different sequences with X-ray crystallographic structures. In addition to a comprehensive ranking of predictors, this analysis produced the following interesting observations. (i) The predictors cluster according to their disorder definition, with a consensus giving more confidence. (ii) Previous assessments appear over-reliant on data annotated at the PDB chain level and performance is lower on entire UniProt sequences. (iii) Long disordered regions are harder to predict. (iv) Depending on the structural and functional types of the proteins, differences in prediction performance of up to 10% are observed. AVAILABILITY The datasets are available from Web site at URL: http://mobidb.bio.unipd.it/lsd. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  P. Tompa Intrinsically unstructured proteins. , 2002, Trends in biochemical sciences.

[2]  Maria Jesus Martin,et al.  SIFTS: Structure Integration with Function, Taxonomy and Sequences resource , 2012, Nucleic Acids Res..

[3]  Avner Schlessinger,et al.  Improved Disorder Prediction by Combination of Orthogonal Approaches , 2009, PloS one.

[4]  C. Brown,et al.  Intrinsic protein disorder in complete genomes. , 2000, Genome informatics. Workshop on Genome Informatics.

[5]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[6]  Yutaka Kuroda,et al.  POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions , 2007, Bioinform..

[7]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[8]  A. Dunker,et al.  Disorder and sequence repeats in hub proteins and their implications for network evolution. , 2006, Journal of proteome research.

[9]  Monika Fuxreiter,et al.  Close encounters of the third kind: disordered domains and the interactions of proteins , 2009, BioEssays : news and reviews in molecular, cellular and developmental biology.

[10]  Jianlin Cheng,et al.  DNdisorder: predicting protein disorder using boosting and deep networks , 2013, BMC Bioinformatics.

[11]  L. Iakoucheva,et al.  Intrinsic Disorder and Protein Function , 2002 .

[12]  Zoran Obradovic,et al.  DisProt: the Database of Disordered Proteins , 2006, Nucleic Acids Res..

[13]  Robert B. Russell,et al.  GlobPlot: exploring protein sequences for globularity and disorder , 2003, Nucleic Acids Res..

[14]  S. Vucetic,et al.  Flavors of protein disorder , 2003, Proteins.

[15]  Anna Tramontano,et al.  Assessment of protein disorder region predictions in CASP10 , 2014, Proteins.

[16]  Silvio C. E. Tosatto,et al.  ESpritz: accurate and fast prediction of protein disorder , 2012, Bioinform..

[17]  J. Beckmann,et al.  FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded. , 2005, Bioinformatics.

[18]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[19]  Silvio C. E. Tosatto,et al.  MOBI: a web server to define and visualize structural mobility in NMR protein ensembles , 2010, Bioinform..

[20]  T. Gibson,et al.  Protein disorder prediction: implications for structural proteomics. , 2003, Structure.

[21]  Kengo Kinoshita,et al.  Prediction of disordered regions in proteins based on the meta approach , 2008, Bioinform..

[22]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[23]  Predrag Radivojac,et al.  Influence of Sequence Changes and Environment on Intrinsically Disordered Proteins , 2009, PLoS Comput. Biol..

[24]  David S. Goodsell,et al.  The RCSB Protein Data Bank: new resources for research and education , 2012, Nucleic Acids Res..

[25]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[26]  A Keith Dunker,et al.  Analysis of structured and intrinsically disordered regions of transmembrane proteins. , 2009, Molecular bioSystems.

[27]  David T. Jones,et al.  Getting the most from PSI-BLAST. , 2002, Trends in biochemical sciences.

[28]  Lukasz A. Kurgan,et al.  Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources , 2010, Bioinform..

[29]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[30]  Silvio C. E. Tosatto,et al.  MobiDB: a comprehensive database of intrinsic protein disorder annotations , 2012, Bioinform..

[31]  Zoran Obradovic,et al.  The protein trinity—linking function and disorder , 2001, Nature Biotechnology.

[32]  Sonia Longhi,et al.  Structural disorder in viral proteins. , 2014, Chemical reviews.

[33]  H. Dyson,et al.  Linking folding and binding. , 2009, Current opinion in structural biology.

[34]  Lukasz A. Kurgan,et al.  In-silico prediction of disorder content using hybrid sequence representation , 2011, BMC Bioinformatics.

[35]  Kengo Kinoshita,et al.  PrDOS: prediction of disordered protein regions from amino acid sequence , 2007, Nucleic Acids Res..

[36]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[37]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[38]  Jaime Prilusky,et al.  FoldIndex copyright: a simple tool to predict whether a given protein sequence is intrinsically unfolded , 2005, Bioinform..

[39]  Michail Yu. Lobanov,et al.  Prediction of Amyloidogenic and Disordered Regions in Protein Chains , 2006, PLoS Comput. Biol..

[40]  Roland L. Dunbrack,et al.  PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. , 2010, Biochimica et biophysica acta.

[41]  Giorgio Valle,et al.  Simple consensus procedures are effective and sufficient in secondary structure prediction. , 2003, Protein engineering.

[42]  Silvio C. E. Tosatto,et al.  CSpritz: accurate prediction of protein disorder segments with annotation for homology, secondary structure and linear motifs , 2011, Nucleic Acids Res..

[43]  B. Rost,et al.  Protein disorder--a breakthrough invention of evolution? , 2011, Current opinion in structural biology.

[44]  Zheng Rong Yang,et al.  RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins , 2005, Bioinform..

[45]  Gary D Bader,et al.  Bringing order to protein disorder through comparative genomics and genetic interactions , 2011, Genome Biology.

[46]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[47]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[48]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.