PredictProtein - Predicting Protein Structure and Function for 29 Years

Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein’s infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold; user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and secondary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. PredictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings. Availability Freely accessible webserver PredictProtein.org; Source and docker images: github.com/rostlab

[1]  Kevin K. Yang,et al.  Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets , 2021, Current protocols.

[2]  Peter B. McGarvey,et al.  UniProt: the universal protein knowledgebase in 2021 , 2020, Nucleic Acids Res..

[3]  Myle Ott,et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.

[4]  Llion Jones,et al.  ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Learning , 2021 .

[5]  AlignmentViewer: Sequence Analysis of Large Protein Families [version 1; peer review: 1 approved, 1 approved with reservations] , 2021 .

[6]  Tom Sercu,et al.  Transformer protein language models are unsupervised structure learners , 2020, bioRxiv.

[7]  A. Keller,et al.  On the lifetime of bioinformatics web services , 2020, Nucleic acids research.

[8]  Ewen Callaway,et al.  ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures , 2020, Nature.

[9]  Burkhard Rost,et al.  Embeddings from deep learning transfer GO annotations beyond homology , 2020, bioRxiv.

[10]  Michael Heinzinger,et al.  bio_embeddings: python pipeline for fast visualization of protein features extracted by language models , 2020 .

[11]  Bosco K. Ho,et al.  SARS-CoV-2 structural coverage map reveals state changes that disrupt host immunity , 2020, bioRxiv.

[12]  B. Rost,et al.  ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing , 2020, bioRxiv.

[13]  Nadia El-Mabrouk,et al.  ISMB 2020 proceedings , 2020, Bioinform..

[14]  Anne Morgat,et al.  UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase , 2020, Bioinformatics.

[15]  G. Minasov,et al.  2.05 Angstrom Resolution Crystal Structure of C-terminal Dimerization Domain of Nucleocapsid Phosphoprotein from SARS-CoV-2 , 2020 .

[16]  Evzen Boura,et al.  Structural basis of RNA recognition by the SARS-CoV-2 nucleocapsid phosphoprotein , 2020, bioRxiv.

[17]  A. Godzik,et al.  Crystal structure of RNA binding domain of nucleocapsid phosphoprotein from SARS coronavirus 2 , 2020 .

[18]  B. Rost,et al.  ProNA2020 predicts protein-DNA, protein-RNA and protein-protein binding proteins and residues from sequence. , 2020, Journal of molecular biology.

[19]  Burkhard Rost,et al.  Visualizing Human Protein‐Protein Interactions and Subcellular Localizations on Cell Images Through CellMap , 2020, Current protocols in bioinformatics.

[20]  Burkhard Rost,et al.  Modeling aspects of the language of life through transfer-learning protein sequences , 2019, BMC Bioinformatics.

[21]  John Canny,et al.  Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.

[22]  Tapio Salakoski,et al.  The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens , 2019, Genome Biology.

[23]  George M. Church,et al.  Unified rational protein engineering with sequence-only deep representation learning , 2019, bioRxiv.

[24]  Ole Winther,et al.  NetSurfP‐2.0: Improved prediction of protein structural features by integrated deep learning , 2019, Proteins.

[25]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[26]  M Mirdita,et al.  MMseqs2 desktop and local web server app for fast, interactive sequence searches , 2018, bioRxiv.

[27]  Johannes Söding,et al.  Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold , 2018, Nature Methods.

[28]  Piotr Gawron,et al.  MolArt: a molecular structure annotation and visualization tool , 2018, Bioinform..

[29]  Alice C McHardy,et al.  Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX) , 2018, Scientific Reports.

[30]  Ole Winther,et al.  NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning , 2018, bioRxiv.

[31]  Andriy Kryshtafovych,et al.  Assessment of hard target modeling in CASP12 reveals an emerging role of alignment‐based contact prediction methods , 2018, Proteins.

[32]  Chris Sander,et al.  AlignmentViewer: Sequence Analysis of Large Protein Families , 2018 .

[33]  Mohammed AlQuraishi,et al.  End-to-end differentiable learning of protein structure , 2018, bioRxiv.

[34]  Johannes Söding,et al.  Clustering huge protein sequence sets in linear time , 2017, Nature Communications.

[35]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[36]  Cory B. Giles,et al.  Use it or lose it: citations predict the continued online availability of published bioinformatics resources , 2017, Nucleic acids research.

[37]  Maria Jesus Martin,et al.  ProtVista: visualization of protein sequence annotations , 2017, Bioinform..

[38]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[39]  B. Rost,et al.  TMSEG: Novel prediction of transmembrane helices , 2016, Proteins.

[40]  Itay Mayrose,et al.  ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules , 2016, Nucleic Acids Res..

[41]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[42]  B. Rost,et al.  Better prediction of functional effects for sequence variants , 2015, BMC Genomics.

[43]  Burkhard Rost,et al.  Evolutionary profiles improve protein-protein interaction prediction from sequence , 2015, Bioinform..

[44]  Fabian A. Buske,et al.  Aquaria: simplifying discovery and insight from protein structures , 2015, Nature Methods.

[45]  Prudence Mutowo-Meullenet,et al.  The GOA database: Gene Ontology annotation updates for 2015 , 2014, Nucleic Acids Res..

[46]  Burkhard Rost,et al.  LocTree3 prediction of localization , 2014, Nucleic Acids Res..

[47]  Avner Schlessinger,et al.  PredictProtein—an open resource for online prediction of protein structural and functional features , 2014, Nucleic Acids Res..

[48]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[49]  László Kaján,et al.  Cloud Prediction of Protein Structure and Function with PredictProtein for Debian , 2013, BioMed research international.

[50]  B. Rost,et al.  Accelerating the Original Profile Kernel , 2013, PloS one.

[51]  Itay Mayrose,et al.  ConSurf: Using Evolutionary Data to Raise Testable Hypotheses about Protein Function , 2013 .

[52]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[53]  Gunnar Rätsch,et al.  Persistence and Availability of Web Services in Computational Biology , 2011, PloS one.

[54]  Vasant Honavar,et al.  PRIDB: a protein–RNA interface database , 2010, Nucleic Acids Res..

[55]  Francisco Melo,et al.  The Protein-DNA Interface database , 2010, BMC Bioinformatics.

[56]  Tal Pupko,et al.  ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids , 2010, Nucleic Acids Res..

[57]  Avner Schlessinger,et al.  Improved Disorder Prediction by Combination of Orthogonal Approaches , 2009, PloS one.

[58]  Burkhard Rost,et al.  PROFtmb: a web server for predicting bacterial transmembrane beta barrel proteins , 2006, Nucleic Acids Res..

[59]  Alessio Ceroni,et al.  DISULFIND: a disulfide bonding state and cysteine connectivity prediction server , 2006, Nucleic Acids Res..

[60]  Avner Schlessinger,et al.  PROFbval: predict flexible and rigid residues in proteins , 2006, Bioinform..

[61]  Piero Fariselli,et al.  ConSeq: the identification of functionally and structurally important residues in protein sequences , 2004, Bioinform..

[62]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[63]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[64]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[65]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[66]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[67]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[68]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[69]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[70]  Chris Sander,et al.  Jury returns on structure prediction , 1992, Nature.

[71]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.