D2P2: database of disordered protein predictions

We present the Database of Disordered Protein Prediction (D2P2), available at http://d2p2.pro (including website source code). A battery of disorder predictors and their variants, VL-XT, VSL2b, PrDOS, PV2, Espritz and IUPred, were run on all protein sequences from 1765 complete proteomes (to be updated as more genomes are completed). Integrated with these results are all of the predicted (mostly structured) SCOP domains using the SUPERFAMILY predictor. These disorder/structure annotations together enable comparison of the disorder predictors with each other and examination of the overlap between disordered predictions and SCOP domains on a large scale. D2P2 will increase our understanding of the interplay between disorder and structure, the genomic distribution of disorder, and its evolutionary history. The parsed data are made available in a unified format for download as flat files or SQL tables either by genome, by predictor, or for the complete set. An interactive website provides a graphical view of each protein annotated with the SCOP domains and disordered regions from all predictors overlaid (or shown as a consensus). There are statistics and tools for browsing and comparing genomes and their disorder within the context of their position on the tree of life.

[1]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[2]  Zoran Obradovic,et al.  DisProt: the Database of Disordered Proteins , 2006, Nucleic Acids Res..

[3]  M. Bolognesi,et al.  Function and Structure of Inherently Disordered Proteins This Review Comes from a Themed Issue on Proteins Edited Prediction of Non-folding Proteins and Regions Frequency of Disordered Regions Protein Evolution Partitioning Unstructured Proteins and Regions into Groups Involvement of Inherently Diso , 2022 .

[4]  Janusz M. Bujnicki,et al.  MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins , 2012, BMC Bioinformatics.

[5]  Cyrus Chothia,et al.  SUPERFAMILY 1.75 including a domain-centric gene ontology method , 2010, Nucleic Acids Res..

[6]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[7]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[8]  Motonori Ota,et al.  IDEAL: Intrinsically Disordered proteins with Extensive Annotations and Literature , 2011, Nucleic Acids Res..

[9]  John L Markley,et al.  Micelle-induced folding of spinach thylakoid soluble phosphoprotein of 9 kDa and its functional implications. , 2006, Biochemistry.

[10]  Silvio C. E. Tosatto,et al.  ESpritz: accurate and fast prediction of protein disorder , 2012, Bioinform..

[11]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[12]  Anna Tramontano,et al.  Evaluation of disorder predictions in CASP9 , 2011, Proteins.

[13]  Kengo Kinoshita,et al.  PrDOS: prediction of disordered protein regions from amino acid sequence , 2007, Nucleic Acids Res..

[14]  A Keith Dunker,et al.  Conservation of intrinsic disorder in protein domains and families: II. functions of conserved disorder. , 2006, Journal of proteome research.

[15]  Zsuzsanna Dosztányi,et al.  Prediction of Protein Binding Regions in Disordered Proteins , 2009, PLoS Comput. Biol..

[16]  Xinchen Wang,et al.  Tissue-specific alternative splicing remodels protein-protein interaction networks. , 2012, Molecular cell.

[17]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[18]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[19]  Lukasz Kurgan,et al.  Comprehensive comparative assessment of in-silico predictors of disordered regions. , 2012, Current protein & peptide science.

[20]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[21]  A.K. Dunker,et al.  Identifying disordered regions in proteins from amino acid sequence , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[22]  Bin Zhang,et al.  PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse , 2011, Nucleic Acids Res..

[23]  BMC Bioinformatics , 2005 .

[24]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[25]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[26]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[27]  Takashi Gojobori,et al.  Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors , 2009, BMC Structural Biology.

[28]  Jessica W. Chen Conversation of Intrinsic Disorder in Protein Domains and Families , 2005 .

[29]  David S. Goodsell,et al.  The RCSB Protein Data Bank: redesigned web site and web services , 2010, Nucleic Acids Res..

[30]  Silvio C. E. Tosatto,et al.  MobiDB: a comprehensive database of intrinsic protein disorder annotations , 2012, Bioinform..

[31]  Takashi Gojobori,et al.  Binary classification of protein molecules into intrinsically disordered and ordered segments , 2011, BMC Structural Biology.

[32]  Joshua L. Phillips,et al.  A Bimodal Distribution of Two Distinct Categories of Intrinsically Disordered Structures with Separate Functions in FG Nucleoporins* , 2010, Molecular & Cellular Proteomics.

[33]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[34]  Obradovic,et al.  Predicting Protein Disorder for N-, C-, and Internal Regions. , 1999, Genome informatics. Workshop on Genome Informatics.

[35]  Silvio C. E. Tosatto,et al.  MOBI: a web server to define and visualize structural mobility in NMR protein ensembles , 2010, Bioinform..

[36]  Mohamed F Ghalwash,et al.  Uncertainty analysis in protein disorder prediction. , 2012, Molecular bioSystems.

[37]  V. Uversky Natively unfolded proteins: A point where biology waits for physics , 2002, Protein science : a publication of the Protein Society.

[38]  Lukasz A. Kurgan,et al.  On the Complementarity of the Consensus-Based Disorder Prediction , 2011, Pacific Symposium on Biocomputing.

[39]  P. Romero,et al.  Conservation of intrinsic disorder in protein domains and families: I. A database of conserved predicted disordered regions. , 2006, Journal of Proteome Research.

[40]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[41]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[42]  Roland L. Dunbrack,et al.  PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. , 2010, Biochimica et biophysica acta.

[43]  A. Biegert,et al.  Sequence context-specific profiles for homology searching , 2009, Proceedings of the National Academy of Sciences.

[44]  V. Uversky,et al.  Why are “natively unfolded” proteins unstructured under physiologic conditions? , 2000, Proteins.