DescribePROT: database of amino acid-level protein structure and function predictions

Abstract We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

[1]  Jef D. Boeke,et al.  Structure of a Sir2 enzyme bound to an acetylated p53 peptide. , 2002, Molecular cell.

[2]  Silvio C. E. Tosatto,et al.  MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins , 2017, Nucleic Acids Res..

[3]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[4]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[5]  Lukasz A. Kurgan,et al.  DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences , 2016, Bioinform..

[6]  J. Rizo,et al.  A partially disordered region connects gene repression and activation functions of EZH2 , 2020, Proceedings of the National Academy of Sciences.

[7]  Lukasz Kurgan,et al.  Sequence Similarity Searching , 2018, Current protocols in protein science.

[8]  Qiang Guo,et al.  Advances in protein contact map prediction based on machine learning. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[9]  Daniel B. Roche,et al.  Proteins and Their Interacting Partners: An Introduction to Protein–Ligand Binding Site Prediction Methods , 2015, International journal of molecular sciences.

[10]  Rui Zhao,et al.  An Overview of the Prediction of Protein DNA-Binding Sites , 2015, International journal of molecular sciences.

[11]  L. Iakoucheva,et al.  Intrinsic Disorder and Protein Function , 2002 .

[12]  Lukasz Kurgan,et al.  Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins , 2020, Bioinform..

[13]  Vladimir N. Uversky,et al.  p53 Proteoforms and Intrinsic Disorder: An Illustration of the Protein Structure–Function Continuum Concept , 2016, International journal of molecular sciences.

[14]  Andriy Kryshtafovych,et al.  Assessment of contact predictions in CASP12: Co‐evolution and deep learning coming of age , 2017, Proteins.

[15]  Lukasz A. Kurgan,et al.  D2P2: database of disordered protein predictions , 2012, Nucleic Acids Res..

[16]  Paradoxes and wonders of intrinsic disorder: Prevalence of exceptionality , 2015, Intrinsically disordered proteins.

[17]  Yaoqi Zhou,et al.  DEPICTER: intrinsic disorder and disorder function prediction server. , 2019, Journal of molecular biology.

[18]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[19]  Thierry Soussi,et al.  Assessing TP53 status in human tumours to evaluate clinical outcome , 2001, Nature Reviews Cancer.

[20]  Lukasz Kurgan,et al.  High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder , 2015, Nucleic acids research.

[21]  Konstantinos D. Tsirigos,et al.  SignalP 5.0 improves signal peptide predictions using deep neural networks , 2019, Nature Biotechnology.

[22]  Lukasz A. Kurgan,et al.  Critical assessment of high-throughput standalone methods for secondary structure prediction , 2011, Briefings Bioinform..

[23]  Mark P. Styczynski,et al.  BLOSUM62 miscalculations improve search performance , 2008, Nature Biotechnology.

[24]  Christopher J. Oldfield,et al.  Classification of Intrinsically Disordered Regions and Proteins , 2014, Chemical reviews.

[25]  Lukasz Kurgan,et al.  Structural protein descriptors in 1-dimension and their sequence-based predictions. , 2011, Current protein & peptide science.

[26]  Cole H. Christie,et al.  Protein Data Bank: the single global archive for 3D macromolecular structure data , 2018, Nucleic acids research.

[27]  P. Radivojac,et al.  PROTEINS: Structure, Function, and Bioinformatics Suppl 7:176–182 (2005) Exploiting Heterogeneous Sequence Properties Improves Prediction of Protein Disorder , 2022 .

[28]  P. Friedman,et al.  Human p53 is phosphorylated by p60-cdc2 and cyclin B-cdc2. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Yuedong Yang,et al.  Prediction of RNA binding proteins comes of age from low resolution to high resolution. , 2013, Molecular bioSystems.

[30]  Burkhard Rost,et al.  Prediction in 1D: secondary structure, membrane helices, and accessibility. , 2003, Methods of biochemical analysis.

[31]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[32]  A Keith Dunker,et al.  Molecular recognition features (MoRFs) in three domains of life. , 2016, Molecular bioSystems.

[33]  K. McLure,et al.  How p53 binds DNA as a tetramer , 1998, The EMBO journal.

[34]  H. Jane Dyson,et al.  Cooperative regulation of p53 by modulation of ternary complex formation with CBP/p300 and HDM2 , 2009, Proceedings of the National Academy of Sciences.

[35]  Yaoqi Zhou,et al.  Fast and Accurate Accessible Surface Area Prediction Without a Sequence Profile. , 2017, Methods in molecular biology.

[36]  Lukasz Kurgan,et al.  Prediction of Disordered RNA, DNA, and Protein Binding Regions Using DisoRDPbind. , 2017, Methods in molecular biology.

[37]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[38]  Lukasz Kurgan,et al.  Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains , 2019, Briefings Bioinform..

[39]  Lukasz Kurgan,et al.  Accuracy of protein-level disorder predictions , 2020, Briefings Bioinform..

[40]  Lukasz Kurgan,et al.  SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences , 2019, Bioinform..

[41]  Kai Wang,et al.  Incorporating background frequency improves entropy-based residue conservation measures , 2006, BMC Bioinform..

[42]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[43]  Lukasz Kurgan,et al.  Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences. , 2019, Methods in molecular biology.

[44]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[45]  Lukasz Kurgan,et al.  Computational Prediction of Protein Secondary Structure from Sequence , 2016, Current protocols in protein science.

[46]  Lukasz Kurgan,et al.  Disordered RNA-Binding Region Prediction with DisoRDPbind. , 2020, Methods in molecular biology.

[47]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[48]  Lukasz A. Kurgan,et al.  A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues , 2016, Briefings Bioinform..

[49]  Lukasz Kurgan,et al.  Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions , 2017, Cellular and Molecular Life Sciences.

[50]  J. Bujnicki,et al.  Computational methods for prediction of protein-RNA interactions. , 2012, Journal of structural biology.

[51]  M Mirdita,et al.  MMseqs2 desktop and local web server app for fast, interactive sequence searches , 2018, bioRxiv.

[52]  Lukasz Kurgan,et al.  Computational Prediction of Intrinsic Disorder in Proteins , 2017, Current protocols in protein science.

[53]  A. Dunker,et al.  Intrinsically disordered regions of p53 family are highly diversified in evolution. , 2013, Biochimica et biophysica acta.

[54]  Maria Miller,et al.  Structural basis for p300 Taz2-p53 TAD1 binding and modulation by phosphorylation. , 2009, Structure.

[55]  Henrik Nielsen,et al.  Predicting Secretory Proteins with SignalP. , 2017, Methods in molecular biology.

[56]  Mengchen Liu,et al.  Improving the prediction of protein‐nucleic acids binding residues via multiple sequence profiles and the consensus of complementary methods , 2018, Bioinform..

[57]  Avner Schlessinger,et al.  PredictProtein—an open resource for online prediction of protein structural and functional features , 2014, Nucleic Acids Res..

[58]  Anna Tramontano,et al.  Assessment of protein disorder region predictions in CASP10 , 2014, Proteins.

[59]  E. Segal,et al.  p53 binds preferentially to genomic regions with high DNA-encoded nucleosome occupancy. , 2010, Genome research.

[60]  A. Fersht,et al.  Structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain , 2008, Proceedings of the National Academy of Sciences.

[61]  Silvio C. E. Tosatto,et al.  Comprehensive large-scale assessment of intrinsic protein disorder , 2015, Bioinform..

[62]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[63]  Michal Brylinski,et al.  Predicting protein interface residues using easily accessible on-line resources , 2015, Briefings Bioinform..

[64]  I. Xenarios,et al.  UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. , 2016, Methods in molecular biology.

[65]  Yaoqi Zhou,et al.  Accurate single‐sequence prediction of solvent accessible surface area using local and global features , 2014, Proteins.

[66]  Xin Deng,et al.  The MULTICOM toolbox for protein structure prediction , 2012, BMC Bioinformatics.

[67]  Silvio C. E. Tosatto,et al.  DisProt: intrinsic protein disorder annotation in 2020 , 2019, Nucleic Acids Res..

[68]  Christopher J. Oldfield,et al.  Flexible nets: disorder and induced fit in the associations of p53 and 14-3-3 with their partners , 2008, BMC Genomics.

[69]  Eric Westhof,et al.  A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs , 2015, PLoS Comput. Biol..

[70]  Amirhossein Sakhteman,et al.  A systematic review on popularity, application and characteristics of protein secondary structure prediction tools. , 2018, Current drug discovery technologies.

[71]  Lukasz Kurgan,et al.  Comprehensive comparative assessment of in-silico predictors of disordered regions. , 2012, Current protein & peptide science.

[72]  Haesun Park,et al.  Prediction of protein relative solvent accessibility with support vector machines and long‐range interaction 3D local descriptor , 2004, Proteins.

[73]  Shaowen Yao,et al.  Protein secondary structure prediction: A survey of the state of the art. , 2017, Journal of molecular graphics & modelling.

[74]  wwPDB consortium,et al.  Protein Data Bank: the single global archive for 3D macromolecular structure data , 2019, Nucleic Acids Res..

[75]  Lukasz Kurgan,et al.  Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life , 2014, Cellular and Molecular Life Sciences.

[76]  Zsuzsanna Dosztányi,et al.  IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding , 2018, Nucleic Acids Res..

[77]  Austin G. Meyer,et al.  Maximum Allowed Solvent Accessibilites of Residues in Proteins , 2012, PloS one.

[78]  Sonia Longhi,et al.  How disordered is my protein and what is its disorder for? A guide through the “dark side” of the protein universe , 2016, Intrinsically disordered proteins.

[79]  Lukasz A. Kurgan,et al.  Review and comparative assessment of sequence‐based predictors of protein‐binding residues , 2018, Briefings Bioinform..

[80]  Silvio C. E. Tosatto,et al.  MobiDB: a comprehensive database of intrinsic protein disorder annotations , 2012, Bioinform..

[81]  Geoffrey I. Webb,et al.  DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites , 2019, Bioinform..

[82]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[83]  Lukasz Kurgan,et al.  In Silico Prediction and Validation of Novel RNA Binding Proteins and Residues in the Human Proteome , 2018, Proteomics.

[84]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[85]  Christopher J. Oldfield,et al.  Understanding COVID-19 via comparative analysis of dark proteomes of SARS-CoV-2, human SARS and bat SARS-like coronaviruses , 2020, Cellular and Molecular Life Sciences.

[86]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[87]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[88]  Eléonore Toufektchan,et al.  The Guardian of the Genome Revisited: p53 Downregulates Genes Required for Telomere Maintenance, DNA Repair, and Centromere Structure , 2018, Cancers.

[89]  Lukasz Kurgan,et al.  DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues , 2017, Nucleic acids research.

[90]  Jörg Gsponer,et al.  MoRFchibi SYSTEM: software tools for the identification of MoRFs in protein sequences , 2016, Nucleic Acids Res..

[91]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[92]  R. Wu,et al.  Computational Prediction of RNA-Binding Proteins and Binding Sites , 2015, International journal of molecular sciences.

[93]  Roland L. Dunbrack,et al.  Assessment of disorder predictions in CASP6 , 2005, Proteins.

[94]  Juan Fernández-Recio,et al.  Prediction of protein binding sites and hot spots , 2011 .

[95]  Marc S. Cortese,et al.  Analysis of molecular recognition features (MoRFs). , 2006, Journal of molecular biology.

[96]  Daniel W. A. Buchan,et al.  The PSIPRED Protein Analysis Workbench: 20 years on , 2019, Nucleic Acids Res..

[97]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[98]  Christopher J. Oldfield,et al.  Introduction to intrinsically disordered proteins and regions , 2019, Intrinsically Disordered Proteins.

[99]  A. Dunker,et al.  Identification of Intrinsic Disorder in Complexes from the Protein Data Bank , 2018, ACS omega.

[100]  Roberto Sanchez,et al.  Structural mechanism of the bromodomain of the coactivator CBP in p53 transcriptional activation. , 2004, Molecular cell.

[101]  Lukasz Kurgan,et al.  Computational Prediction of MoRFs, Short Disorder-to-order Transitioning Protein Binding Regions , 2019, Computational and structural biotechnology journal.

[102]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.