MobiDB‐lite: fast and highly specific consensus prediction of intrinsic disorder in proteins

Motivation: Intrinsic disorder (ID) is established as an important feature of protein sequences. Its use in proteome annotation is however hampered by the availability of many methods with similar performance at the single residue level, which have mostly not been optimized to predict long ID regions of size comparable to domains. Results: Here, we have focused on providing a single consensus‐based prediction, MobiDB‐lite, optimized for highly specific (i.e. few false positive) predictions of long disorder. The method uses eight different predictors to derive a consensus which is then filtered for spurious short predictions. Consensus prediction is shown to outperform the single methods when annotating long ID regions. MobiDB‐lite can be useful in large‐scale annotation scenarios and has indeed already been integrated in the MobiDB, DisProt and InterPro databases. Availability and Implementation: MobiDB‐lite is available as part of the MobiDB database from URL: http://mobidb.bio.unipd.it/. An executable can be downloaded from URL: http://protein.bio.unipd.it/mobidblite/. Contact: silvio.tosatto@unipd.it Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Silvio C. E. Tosatto,et al.  InterPro in 2017—beyond protein family and domain annotations , 2016, Nucleic Acids Res..

[2]  P. Tompa,et al.  Introducing protein intrinsic disorder. , 2014, Chemical reviews.

[3]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[4]  Robert D. Finn,et al.  The challenge of increasing Pfam coverage of the human proteome , 2013, Database J. Biol. Databases Curation.

[5]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[6]  Silvio C. E. Tosatto,et al.  MobiDB: a comprehensive database of intrinsic protein disorder annotations , 2012, Bioinform..

[7]  Silvio C. E. Tosatto,et al.  ESpritz: accurate and fast prediction of protein disorder , 2012, Bioinform..

[8]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[9]  Silvio C. E. Tosatto,et al.  Comprehensive large-scale assessment of intrinsic protein disorder , 2015, Bioinform..

[10]  Sonia Longhi,et al.  DisProt 7.0: a major update of the database of disordered proteins , 2016, Nucleic Acids Res..

[11]  Zsuzsanna Dosztányi,et al.  Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins , 2010, Briefings Bioinform..

[12]  Robert B. Russell,et al.  GlobPlot: exploring protein sequences for globularity and disorder , 2003, Nucleic Acids Res..

[13]  Abhik Mukhopadhyay,et al.  PDBe: improved accessibility of macromolecular structure data from PDB and EMDB , 2015, Nucleic Acids Res..

[14]  T. Gibson,et al.  Protein disorder prediction: implications for structural proteomics. , 2003, Structure.

[15]  Silvio C. E. Tosatto,et al.  MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins , 2014, Nucleic Acids Res..

[16]  Silvio C. E. Tosatto,et al.  Large‐scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe , 2016, Protein science : a publication of the Protein Society.

[17]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[18]  David A. Lee,et al.  Identification and distribution of protein families in 120 completed genomes using Gene3D , 2005, Proteins.

[19]  Anna Tramontano,et al.  Assessment of protein disorder region predictions in CASP10 , 2014, Proteins.