The MPI bioinformatics Toolkit as an integrative platform for advanced protein sequence and structure analysis

The MPI Bioinformatics Toolkit (http://toolkit.tuebingen.mpg.de) is an open, interactive web service for comprehensive and collaborative protein bioinformatic analysis. It offers a wide array of interconnected, state-of-the-art bioinformatics tools to experts and non-experts alike, developed both externally (e.g. BLAST+, HMMER3, MUSCLE) and internally (e.g. HHpred, HHblits, PCOILS). While a beta version of the Toolkit was released 10 years ago, the current production-level release has been available since 2008 and has serviced more than 1.6 million external user queries. The usage of the Toolkit has continued to increase linearly over the years, reaching more than 400 000 queries in 2015. In fact, through the breadth of its tools and their tight interconnection, the Toolkit has become an excellent platform for experimental scientists as well as a useful resource for teaching bioinformatic inquiry to students in the life sciences. In this article, we report on the evolution of the Toolkit over the last ten years, focusing on the expansion of the tool repertoire (e.g. CS-BLAST, HHblits) and on infrastructural work needed to remain operative in a changing web environment.

[1]  Erik L. L. Sonnhammer,et al.  Kalign – an accurate and fast multiple sequence alignment algorithm , 2005, BMC Bioinformatics.

[2]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[3]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[4]  Markus Gruber,et al.  REPPER—repeats and their periodicities in fibrous proteins , 2005, Nucleic Acids Res..

[5]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[6]  Jaap Heringa,et al.  Tracking repeats using significance and transitivity , 2004, ISMB/ECCB.

[7]  Yongchao Liu,et al.  Multiple protein sequence alignment with MSAProbs. , 2014, Methods in molecular biology.

[8]  Burkhard Rost,et al.  Anatomy of BioJS, an open source community for the life sciences , 2015, eLife.

[9]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[10]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[11]  Jérôme Gracy,et al.  PAT: a protein analysis toolkit for integrated biocomputing on the web , 2005, Nucleic Acids Res..

[12]  David T. Jones,et al.  Transmembrane protein topology prediction using support vector machines , 2009, BMC Bioinformatics.

[13]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[14]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[15]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[16]  Avner Schlessinger,et al.  PredictProtein—an open resource for online prediction of protein structural and functional features , 2014, Nucleic Acids Res..

[17]  Dirk Linke,et al.  GCView: the genomic context viewer for protein homology searches , 2011, Nucleic Acids Res..

[18]  Andrei N. Lupas,et al.  CLANS: a Java application for visualizing protein families based on pairwise similarity , 2004, Bioinform..

[19]  Johannes Söding,et al.  PDBalert: automatic, recurrent remote homology tracking and protein structure prediction , 2008, BMC Structural Biology.

[20]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[21]  Liisa Holm,et al.  Rapid automatic detection and alignment of repeats in protein sequences , 2000, Proteins.

[22]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[23]  Johannes Söding,et al.  HHsenser: exhaustive transitive profile search using HMM–HMM comparison , 2006, Nucleic Acids Res..

[24]  Simon W. Ginzinger,et al.  SimShiftDB; local conformational restraints derived from chemical shift similarity searches on a large synthetic database , 2009, Journal of biomolecular NMR.

[25]  A. Biegert,et al.  Sequence context-specific profiles for homology searching , 2009, Proceedings of the National Academy of Sciences.

[26]  Johannes Söding,et al.  Comparative analysis of coiled-coil prediction methods. , 2006, Journal of structural biology.

[27]  Andrei N Lupas,et al.  A domain dictionary of trimeric autotransporter adhesins. , 2015, International journal of medical microbiology : IJMM.

[28]  Johannes Söding,et al.  De novo identification of highly diverged protein repeats by probabilistic consistency , 2008, Bioinform..

[29]  Rodrigo Lopez,et al.  The EMBL-EBI bioinformatics web and programmatic tools framework , 2015, Nucleic Acids Res..

[30]  Andrei N. Lupas,et al.  Domain annotation of trimeric autotransporter adhesins—daTAA , 2008, Bioinform..

[31]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[32]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[33]  Johannes Söding,et al.  HHomp—prediction and classification of outer membrane proteins , 2009, Nucleic Acids Res..

[34]  Johannes Söding,et al.  A galaxy of folds , 2009, Protein science : a publication of the Protein Society.

[35]  Michael Habeck,et al.  Robust probabilistic superposition and comparison of protein structures , 2010, BMC Bioinformatics.

[36]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[37]  D. Linke,et al.  ClubSub-P: Cluster-Based Subcellular Localization Prediction for Gram-Negative Bacteria and Archaea , 2011, Front. Microbio..

[38]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[39]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[40]  Yadong Wang,et al.  GLProbs: Aligning Multiple Sequences Adaptively , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  Narmada Thanki,et al.  CDD: NCBI's conserved domain database , 2014, Nucleic Acids Res..

[42]  Michael Habeck,et al.  HHfrag: HMM-based fragment detection using HHpred , 2011, Bioinform..

[43]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[44]  Johannes Söding,et al.  The MPI Bioinformatics Toolkit for protein sequence analysis , 2006, Nucleic Acids Res..

[45]  Andrei N Lupas,et al.  Measuring the conformational space of square four-helical bundles with the program samCC. , 2010, Journal of structural biology.

[46]  J. Söding,et al.  Evolution of outer membrane beta-barrels from an ancestral beta beta hairpin. , 2010, Molecular biology and evolution.

[47]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[48]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[49]  Johannes Söding,et al.  HHrep: de novo protein repeat detection and the origin of TIM barrels , 2006, Nucleic Acids Res..

[50]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[51]  J. Sussman,et al.  JSmol and the Next-Generation Web-Based Representation of 3D Molecular Structure as Applied to Proteopedia , 2013 .

[52]  Andrei N Lupas,et al.  Axial helix rotation as a mechanism for signal regulation inferred from the crystallographic analysis of the E. coli serine chemoreceptor. , 2014, Journal of structural biology.

[53]  Andrei N. Lupas,et al.  Mechanism of regulation of receptor histidine kinases. , 2012, Structure.

[54]  Johannes Söding,et al.  TPRpred: a tool for prediction of TPR-, PPR- and SEL1-like repeats from protein sequences , 2007, BMC Bioinformatics.

[55]  N. Grishin,et al.  Reconstruction of ancestral protein sequences and its applications , 2004, BMC Evolutionary Biology.

[56]  Murray Coles,et al.  The mechanisms of HAMP-mediated signaling in transmembrane receptors. , 2011, Structure.

[57]  M. Delorenzi,et al.  An HMM model for coiled-coil domains and a comparison with PSSM-based predictions , 2002, Bioinform..

[58]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[59]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[60]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..