Where differences resemble: sequence-feature analysis in curated databases of intrinsically disordered proteins

Abstract Intrinsic disorder (ID) in proteins is involved in crucial interactions in the living cell. As the importance of ID is increasingly recognized, so are detailed analyses aimed at its identification and characterization. An open question remains the existence of ID `flavors’ representing different sub-phenomena. Several databases collect manually curated examples of experimentally validated ID, focusing on apparently different aspects of this phenomenon. The recent update of MobiDB presented the opportunity to carry out an in-depth comparison of the content of these validated ID collections, namely DIBS, DisProt, IDEAL, MFIB, FuzDB, ELM and UniProt. In order to assess what is specific to different ID flavors, we analyzed relevant sequence-based features, such as amino acid composition, length, taxa and gene ontology terms, highlighting differences and similarities among datasets. Despite that, the majority of the considered features are not statistically different across databases, with the exception of ELM. FuzDB also shares half of its entries with DisProt. In general, different ID databases describe similar phenomena. DisProt, which is the largest database, better represents the entire spectrum of different disorder flavors and the corresponding sequence diversity.

[1]  Christopher J. Oldfield,et al.  Classification of Intrinsically Disordered Regions and Proteins , 2014, Chemical reviews.

[2]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[3]  Ruth Nussinov,et al.  Analysis of ordered and disordered protein complexes reveals structural features discriminating between stable and unstable monomers. , 2004, Journal of molecular biology.

[4]  Silvio C. E. Tosatto,et al.  Large‐scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe , 2016, Protein science : a publication of the Protein Society.

[5]  P. Romero,et al.  Natively Disordered Proteins , 2008, Applied bioinformatics.

[6]  Aidan Budd,et al.  Short linear motifs: ubiquitous and functionally diverse protein interaction modules directing cell regulation. , 2014, Chemical reviews.

[7]  Sonia Longhi,et al.  Assessing protein disorder and induced folding , 2005, Proteins.

[8]  Zoran Obradovic,et al.  DisProt: a database of protein disorder , 2005, Bioinform..

[9]  Toby J. Gibson,et al.  ELM 2016—data update and new functionality of the eukaryotic linear motif resource , 2015, Nucleic Acids Res..

[10]  P. Radivojac,et al.  Protein flexibility and intrinsic disorder , 2004, Protein science : a publication of the Protein Society.

[11]  Lukasz Kurgan,et al.  Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life , 2014, Cellular and Molecular Life Sciences.

[12]  Márton Miskei,et al.  FuzDB: database of fuzzy complexes, a tool to develop stochastic structure-function relationships for protein complexes and higher-order assemblies , 2016, Nucleic Acids Res..

[13]  R. Pappu,et al.  Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues , 2013, Proceedings of the National Academy of Sciences.

[14]  A. Dunker,et al.  Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life , 2012, Journal of biomolecular structure & dynamics.

[15]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[16]  C. Brown,et al.  Intrinsic protein disorder in complete genomes. , 2000, Genome informatics. Workshop on Genome Informatics.

[17]  Silvio C. E. Tosatto,et al.  MobiDB‐lite: fast and highly specific consensus prediction of intrinsic disorder in proteins , 2017, Bioinform..

[18]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[19]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[20]  Johannes Buchner,et al.  Protein folding handbook , 2005 .

[21]  P. Tompa Intrinsically disordered proteins: a 10-year recap. , 2012, Trends in biochemical sciences.

[22]  H. Dyson,et al.  Coupling of folding and binding for unstructured proteins. , 2002, Current opinion in structural biology.

[23]  Erzsébet Fichó,et al.  MFIB: a repository of protein complexes with mutual folding induced by binding , 2017, Bioinform..

[24]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[25]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[26]  Zsuzsanna Dosztányi,et al.  DIBS: a repository of disordered binding sites mediating interactions with ordered proteins , 2017, Bioinform..

[27]  A Keith Dunker,et al.  Intrinsic disorder and protein function. , 2002, Biochemistry.

[28]  Philippe Bardou,et al.  jvenn: an interactive Venn diagram viewer , 2014, BMC Bioinformatics.

[29]  István Simon,et al.  Disordered Binding Regions and Linear Motifs—Bridging the Gap between Two Models of Molecular Recognition , 2012, PloS one.

[30]  Motonori Ota,et al.  IDEAL in 2014 illustrates interaction networks composed of intrinsically disordered proteins and their binding partners , 2013, Nucleic Acids Res..

[31]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[32]  Silvio C. E. Tosatto,et al.  MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins , 2017, Nucleic Acids Res..

[33]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[34]  V. Uversky Intrinsically Disordered Proteins , 2014 .

[35]  Rohit V Pappu,et al.  CIDER: Resources to Analyze Sequence-Ensemble Relationships of Intrinsically Disordered Proteins , 2017, Biophysical journal.

[36]  Christopher J. Oldfield,et al.  Evolutionary Rate Heterogeneity in Proteins with Long Disordered Regions , 2002, Journal of Molecular Evolution.

[37]  Vladimir N Uversky,et al.  What does it mean to be natively unfolded? , 2002, European journal of biochemistry.