Metaproteomics: Extracting and Mining Proteome Information to Characterize Metabolic Activities in Microbial Communities

Contemporary microbial ecology studies usually employ one or more “omics” approaches to investigate the structure and function of microbial communities. Among these, metaproteomics aims to characterize the metabolic activities of the microbial membership, providing a direct link between the genetic potential and functional metabolism. The successful deployment of metaproteomics research depends on the integration of high‐quality experimental and bioinformatic techniques for uncovering the metabolic activities of a microbial community in a way that is complementary to other “meta‐omic” approaches. The essential, quality‐defining informatics steps in metaproteomics investigations are: (1) construction of the metagenome, (2) functional annotation of predicted protein‐coding genes, (3) protein database searching, (4) protein inference, and (5) extraction of metabolic information. In this article, we provide an overview of current bioinformatic approaches and software implementations in metaproteome studies in order to highlight the key considerations needed for successful implementation of this powerful community‐biology tool. Curr. Protoc. Bioinform. 46:13.26.1‐13.26.14. © 2014 by John Wiley & Sons, Inc.

[1]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[2]  William Stafford Noble,et al.  Rapid and accurate peptide identification from tandem mass spectra. , 2008, Journal of proteome research.

[3]  W. Whitman,et al.  Prokaryotes: the unseen majority. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  I. Rigoutsos,et al.  Accurate phylogenetic classification of variable-length DNA fragments , 2007, Nature Methods.

[5]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[6]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[7]  Yasubumi Sakakibara,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2012, Nucleic acids research.

[8]  D. Tabb,et al.  TagRecon: high-throughput mutation identification through sequence tagging. , 2010, Journal of proteome research.

[9]  J. Clemente,et al.  Human gut microbiome viewed across age and geography , 2012, Nature.

[10]  R. Hettich,et al.  Label-free quantitative proteomics for the extremely thermophilic bacterium Caldicellulosiruptor obsidiansis reveal distinct abundance patterns upon growth on cellobiose, crystalline cellulose, and switchgrass. , 2011, Journal of proteome research.

[11]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[12]  T. Itoh,et al.  MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[13]  Gail L. Rosen,et al.  Combining gene prediction methods to improve metagenomic gene annotation , 2011, BMC Bioinformatics.

[14]  Harald Huber,et al.  Proteomic Characterization of Cellular and Molecular Processes that Enable the Nanoarchaeum equitans-Ignicoccus hospitalis Relationship , 2011, PloS one.

[15]  David R Goodlett,et al.  Comparative metaproteomics reveals ocean-scale shifts in microbial nutrient utilization and energy transduction , 2010, The ISME Journal.

[16]  Hideaki Tanaka,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2011, BCB '11.

[17]  Michelle G. Giglio,et al.  TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes , 2006, Nucleic Acids Res..

[18]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[19]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[20]  D. Tabb,et al.  Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. , 2007, Journal of proteome research.

[21]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[22]  Marshall W. Bern,et al.  De Novo Analysis of Peptide Tandem Mass Spectra by Spectral Graph Partitioning , 2006, J. Comput. Biol..

[23]  Chongle Pan,et al.  Metaproteomics: harnessing the power of high performance mass spectrometry to identify the suite of proteins that control metabolic activities in microbial communities. , 2013, Analytical chemistry.

[24]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[25]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[26]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[27]  Jillian F Banfield,et al.  Microbial communities in acid mine drainage. , 2003, FEMS microbiology ecology.

[28]  Daniel D. Sommer,et al.  MetAMOS: a modular and open source metagenomic assembly and analysis pipeline , 2013, Genome Biology.

[29]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[30]  J. Handelsman,et al.  Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. , 1998, Chemistry & biology.

[31]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[32]  D. Le Paslier,et al.  Characterization of the Active Bacterial Community Involved in Natural Attenuation Processes in Arsenic-Rich Creek Sediments , 2011, Microbial Ecology.

[33]  Zhou Li,et al.  Sipros/ProRata: a versatile informatics system for quantitative community proteomics , 2013, Bioinform..

[34]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[35]  David Tse,et al.  Optimal assembly for high throughput shotgun sequencing , 2013, BMC Bioinformatics.

[36]  Michael D. Litton,et al.  IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. , 2009, Journal of proteome research.

[37]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[38]  Peer Bork,et al.  iPath2.0: interactive pathway explorer , 2011, Nucleic Acids Res..

[39]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[40]  C. Ouzounis,et al.  Expansion of the BioCyc collection of pathway/genome databases to 160 genomes , 2005, Nucleic acids research.

[41]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[42]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[43]  Doug Hyatt,et al.  Enigmatic, ultrasmall, uncultivated Archaea , 2010, Proceedings of the National Academy of Sciences.

[44]  H. Richnow,et al.  Elucidation of in situ polycyclic aromatic hydrocarbon degradation by functional metaproteomics (protein‐SIP) , 2013, Proteomics.

[45]  P. Bork,et al.  iPath: interactive exploration of biochemical pathways and networks. , 2008, Trends in biochemical sciences.

[46]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[47]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[48]  Michael K. Coleman,et al.  Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. , 2006, Journal of proteome research.

[49]  Stephen J. Callister,et al.  Analysis of biostimulated microbial communities from two field experiments reveals temporal and spatial differences in proteome profiles. , 2010, Environmental science & technology.

[50]  Birgit Schilling,et al.  ScanRanker: Quality assessment of tandem mass spectra via sequence tagging. , 2011, Journal of proteome research.

[51]  Richard J. Giannone,et al.  Defining the boundaries and characterizing the landscape of functional genome expression in vascular tissues of Populus using shotgun proteomics. , 2012, Journal of proteome research.

[52]  J. Yates,et al.  Protein analysis by shotgun/bottom-up proteomics. , 2013, Chemical reviews.

[53]  Vincent J. Denef,et al.  Proteomics-inferred genome typing (PIGT) demonstrates inter-population recombination as a strategy for environmental adaptation. , 2009, Environmental microbiology.

[54]  R. Appel,et al.  Guidelines for the next 10 years of proteomics , 2009, Proteomics.

[55]  Vincent J. Denef,et al.  Strain-resolved community genomic analysis of gut microbial colonization in a premature infant , 2010, Proceedings of the National Academy of Sciences.

[56]  J. Doré,et al.  An iterative workflow for mining the human intestinal metaproteome , 2011, BMC Genomics.

[57]  Adam Godzik,et al.  Shotgun metaproteomics of the human distal gut microbiota , 2008, The ISME Journal.

[58]  Richard D. Smith,et al.  Transport functions dominate the SAR11 metaproteome at low-nutrient extremes in the Sargasso Sea , 2009, The ISME Journal.

[59]  Debojyoti Dutta,et al.  MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry. , 2007, Analytical chemistry.

[60]  Siu-Ming Yiu,et al.  Meta-IDBA: a de Novo assembler for metagenomic data , 2011, Bioinform..

[61]  Vincent J. Denef,et al.  Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria , 2007, Nature.

[62]  Joel A. Kooren,et al.  A two‐step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies , 2013, Proteomics.

[63]  Peter D. Karp,et al.  The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases , 2007, Nucleic Acids Res..

[64]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[65]  N. Kyrpides,et al.  Individual genome assembly from complex community short-read metagenomic datasets , 2011, The ISME Journal.

[66]  S. Giovannoni,et al.  The uncultured microbial majority. , 2003, Annual review of microbiology.

[67]  Damian Szklarczyk,et al.  eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations , 2009, Nucleic Acids Res..

[68]  Lewis Y. Geer,et al.  DBParser: web-based software for shotgun proteomic data analyses. , 2004, Journal of proteome research.

[69]  Brandi L. Cantarel,et al.  Integrated Metagenomics/Metaproteomics Reveals Human Host-Microbiota Signatures of Crohn's Disease , 2012, PloS one.