A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane

Metaproteomics, the study of the collective protein composition of multi-organism systems, provides deep insights into the biodiversity of microbial communities and the complex functional interplay between microbes and their hosts or environment. Thus, metaproteomics has become an indispensable tool in various fields such as microbiology and related medical applications. The computational challenges in the analysis of corresponding datasets differ from those of pure-culture proteomics, e.g., due to the higher complexity of the samples and the larger reference databases demanding specific computing pipelines. Corresponding data analyses usually consist of numerous manual steps that must be closely synchronized. With MetaProteomeAnalyzer and Prophane, we have established two open-source software solutions specifically developed and optimized for metaproteomics. Among other features, peptide-spectrum matching is improved by combining different search engines and, compared to similar tools, metaproteome annotation benefits from the most comprehensive set of available databases (such as NCBI, UniProt, EggNOG, PFAM, and CAZy). The workflow described in this protocol combines both tools and leads the user through the entire data analysis process, including protein database creation, database search, protein grouping and annotation, and results visualization. To the best of our knowledge, this protocol presents the most comprehensive, detailed and flexible guide to metaproteomics data analysis to date. While beginners are provided with robust, easy-to-use, state-of-the-art data analysis in a reasonable time (a few hours, depending on, among other factors, the protein database size and the number of identified peptides and inferred proteins), advanced users benefit from the flexibility and adaptability of the workflow.

[1]  Jules Kerssemakers,et al.  mPies: a novel metaproteomics tool for the creation of relevant protein databases and automatized protein annotation , 2019, Biology Direct.

[2]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[3]  Davide Heller,et al.  eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences , 2015, Nucleic Acids Res..

[4]  Chongle Pan,et al.  Metaproteomics: harnessing the power of high performance mass spectrometry to identify the suite of proteins that control metabolic activities in microbial communities. , 2013, Analytical chemistry.

[5]  Molly K. Gibson,et al.  Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology , 2014, The ISME Journal.

[6]  R. Heyer,et al.  Proteotyping of biogas plant microbiomes separates biogas plants according to process temperature and reactor type , 2016, Biotechnology for Biofuels.

[7]  R. Heyer,et al.  The MetaProteomeAnalyzer: a powerful open-source software suite for metaproteomics data analysis and interpretation. , 2015, Journal of proteome research.

[8]  Michael G. Surette,et al.  Culture-enriched metagenomic sequencing enables in-depth profiling of the cystic fibrosis lung microbiota , 2020, Nature Microbiology.

[9]  Martin Taubert,et al.  MetaProSIP: automated inference of stable isotope incorporation rates in proteins for functional metaproteomics. , 2015, Journal of proteome research.

[10]  Dieter Jahn,et al.  A Metaproteomics Approach to Elucidate Host and Pathogen Protein Expression during Catheter-Associated Urinary Tract Infections (CAUTIs) , 2015, Molecular & Cellular Proteomics.

[11]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[12]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[13]  Thilo Muth,et al.  Colonic metaproteomic signatures of active bacteria and the host in obesity , 2015, Proteomics.

[14]  M. Wagner,et al.  Complete nitrification by Nitrospira bacteria , 2015, Nature.

[15]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[16]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[17]  Zhibin Ning,et al.  MetaLab: an automated pipeline for metaproteomic data analysis , 2017, Microbiome.

[18]  C. Huttenhower,et al.  Relating the metatranscriptome and metagenome of the human gut , 2014, Proceedings of the National Academy of Sciences.

[19]  M. Strous,et al.  Assessing species biomass contributions in microbial communities via metaproteomics , 2017, Nature Communications.

[20]  Pratik D Jagtap,et al.  Multi-omic data analysis using Galaxy , 2015, Nature Biotechnology.

[21]  Lennart Martens,et al.  Unipept 4.0: Functional Analysis of Metaproteome Data. , 2018, Journal of proteome research.

[22]  William Stafford Noble,et al.  MetaGOmics: A Web-Based Tool for Peptide-Centric Functional and Taxonomic Analysis of Metaproteomics Data , 2017, Proteomes.

[23]  Jillian F. Banfield,et al.  Community Proteomics of a Natural Microbial Biofilm , 2005 .

[24]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[25]  Ruy Jáuregui,et al.  Gut microbial functional maturation and succession during human early life , 2018, Environmental microbiology.

[26]  Jüergen Cox,et al.  The MaxQuant computational platform for mass spectrometry-based shotgun proteomics , 2016, Nature Protocols.

[27]  Narayanaswamy Srinivasan,et al.  Same but not alike: Structure, flexibility and energetics of domains in multi-domain proteins are influenced by the presence of other domains , 2018, PLoS Comput. Biol..

[28]  Richard J. Giannone,et al.  Metaproteomics: Extracting and Mining Proteome Information to Characterize Metabolic Activities in Microbial Communities , 2014, Current protocols in bioinformatics.

[29]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[30]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[31]  S. Fuchs,et al.  Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis , 2019, Expert review of proteomics.

[32]  M. Grube,et al.  Structure and function of the symbiosis partners of the lung lichen (Lobaria pulmonaria L. Hoffm.) analyzed by metaproteomics , 2011, Proteomics.

[33]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[34]  William Stafford Noble,et al.  Critical decisions in metaproteomics: achieving high confidence protein annotations in a sea of unknowns , 2016, The ISME Journal.

[35]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[36]  Xu Zhang,et al.  iMetaLab 1.0: a web platform for metaproteomics data analysis , 2018, Bioinform..

[37]  Uwe Völker,et al.  Metaproteomics analysis of microbial diversity of human saliva and tongue dorsum in young healthy individuals , 2019, Journal of oral microbiology.

[38]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[39]  Edoardo Pasolli,et al.  Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle , 2019, Cell.

[40]  Laurence Zitvogel,et al.  Gut microbiome influences efficacy of PD-1–based immunotherapy against epithelial tumors , 2018, Science.

[41]  A. Nesvizhskii Proteogenomics: concepts, applications and computational strategies , 2014, Nature Methods.

[42]  P. Wilmes,et al.  The application of two-dimensional polyacrylamide gel electrophoresis and downstream analyses to a mixed community of prokaryotic microorganisms. , 2004, Environmental microbiology.

[43]  Michael K. Coleman,et al.  Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. , 2006, Journal of proteome research.

[44]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[45]  Katherine H. Huang,et al.  A framework for human microbiome research , 2012, Nature.

[46]  Brandi L. Cantarel,et al.  Integrated Metagenomics/Metaproteomics Reveals Human Host-Microbiota Signatures of Crohn's Disease , 2012, PloS one.

[47]  Zhenglu Yang,et al.  dbCAN2: a meta server for automated carbohydrate-active enzyme annotation , 2018, Nucleic Acids Res..

[48]  Luis Pedro Coelho,et al.  Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper , 2016, bioRxiv.

[49]  Srinand Sreevatsan,et al.  Circulating Mycobacterium bovis Peptides and Host Response Proteins as Biomarkers for Unambiguous Detection of Subclinical Infection , 2013, Journal of Clinical Microbiology.

[50]  Haixu Tang,et al.  A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics , 2016, PLoS Comput. Biol..

[51]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[52]  Adam M. Phillippy,et al.  Interactive metagenomic visualization in a Web browser , 2011, BMC Bioinformatics.

[53]  T. Weir,et al.  The gut microbiota at the intersection of diet and human health , 2018, Science.

[54]  David P. Kreil,et al.  Corrigendum: A doublecortin containing microtubule-associated protein is implicated in mechanotransduction in Drosophila sensory cilia , 2014, Nature Communications.

[55]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[56]  Martin Eisenacher,et al.  In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics. , 2017, Journal of proteomics.

[57]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[58]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[59]  Robert Heyer,et al.  Challenges and perspectives of metaproteomic data analysis. , 2017, Journal of biotechnology.

[60]  Thilo Muth,et al.  Disseminating Metaproteomic Informatics Capabilities and Knowledge Using the Galaxy-P Framework , 2018, Proteomes.

[61]  Thilo Muth,et al.  Navigating through metaproteomics data: A logbook of database searching , 2015, Proteomics.

[62]  S. Hubbard,et al.  Addressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies , 2012, Journal of proteome research.

[63]  Luis Pedro Coelho,et al.  Structure and function of the global ocean microbiome , 2015, Science.

[64]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[66]  Susannah G. Tringe,et al.  FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus , 2014, Nucleic acids research.

[67]  Vineet Bafna,et al.  ProteoStorm: An Ultrafast Metaproteomics Database Search Framework. , 2018, Cell systems.

[68]  M. Grube,et al.  Deciphering functional diversification within the lichen microbiota by meta-omics , 2017, Microbiome.

[69]  Eystein Oveland,et al.  PeptideShaker enables reanalysis of MS-derived proteomics data sets , 2015, Nature Biotechnology.

[70]  Thilo Muth,et al.  MPA Portable: A Stand-Alone Software Package for Analyzing Metaproteome Samples on the Go , 2017, Analytical chemistry.

[71]  J. Izard,et al.  The Human Oral Microbiome , 2010, Journal of bacteriology.

[72]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[73]  Massimo Deligios,et al.  Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture , 2013, PloS one.

[74]  Miao Zhao,et al.  The antimicrobial potential of Streptomyces from insect microbiomes , 2019, Nature Communications.

[75]  M. Kleiner Metaproteomics: Much More than Measuring Gene Expression in Microbial Communities , 2019, mSystems.

[76]  Stephan Fuchs,et al.  Soil and leaf litter metaproteomics—a brief guideline from sampling to understanding , 2016, FEMS microbiology ecology.

[77]  Johannes Griss,et al.  Expanding the Use of Spectral Libraries in Proteomics. , 2018, Journal of proteome research.

[78]  Pedro Belda-Ferre,et al.  Microbiota diversity and gene expression dynamics in human oral biofilms , 2013, BMC Genomics.

[79]  I. Cristea,et al.  Proteomic approaches to uncovering virus–host protein interactions during the progression of viral infection , 2016, Expert review of proteomics.

[80]  T. Muth,et al.  The impact of sequence database choice on metaproteomic results in gut microbiota studies , 2016, Microbiome.

[81]  Jörg Bernhardt,et al.  Symbiotic Interplay of Fungi, Algae, and Bacteria within the Lung Lichen Lobaria pulmonaria L. Hoffm. as Assessed by State-of-the-Art Metaproteomics. , 2017, Journal of proteome research.

[82]  John R Yates,et al.  ComPIL 2.0: An Updated Comprehensive Metaproteomics Database. , 2018, Journal of proteome research.

[83]  Thilo Muth,et al.  A Potential Golden Age to Come—Current Tools, Recent Use Cases, and Future Avenues for De Novo Sequencing in Proteomics , 2018, Proteomics.

[84]  M. Mann,et al.  Andromeda: a peptide search engine integrated into the MaxQuant environment. , 2011, Journal of proteome research.

[85]  Lennart Martens,et al.  ThermoRawFileParser: modular, scalable and cross-platform RAW file conversion. , 2019, Journal of proteome research.

[86]  Bernhard Y Renard,et al.  Estimating the computational limits of detection of microbial non‐model organisms , 2015, Proteomics.

[87]  Thilo Muth,et al.  A Robust and Universal Metaproteomics Workflow for Research Studies and Routine Diagnostics Within 24 h Using Phenol Extraction, FASP Digest, and the MetaProteomeAnalyzer , 2019, Front. Microbiol..

[88]  Jörg Bernhardt,et al.  Data visualization in environmental proteomics , 2013, Proteomics.

[89]  Jung Soh,et al.  Exploring functional contexts of symbiotic sustain within lichen-associated bacteria by comparative omics , 2014, The ISME Journal.

[90]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[91]  Thilo Muth,et al.  Metaproteomic data analysis at a glance: advances in computational microbial community proteomics , 2016, Expert review of proteomics.

[92]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[93]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[94]  R. Heyer,et al.  Metaproteome analysis of the microbial communities in agricultural biogas plants. , 2013, New biotechnology.

[95]  Harald Barsnes,et al.  SearchGUI: A Highly Adaptable Common Interface for Proteomics Search and de Novo Engines. , 2018, Journal of proteome research.