MetaGOmics: A Web-Based Tool for Peptide-Centric Functional and Taxonomic Analysis of Metaproteomics Data

Metaproteomics is the characterization of all proteins being expressed by a community of organisms in a complex biological sample at a single point in time. Applications of metaproteomics range from the comparative analysis of environmental samples (such as ocean water and soil) to microbiome data from multicellular organisms (such as the human gut). Metaproteomics research is often focused on the quantitative functional makeup of the metaproteome and which organisms are making those proteins. That is: What are the functions of the currently expressed proteins? How much of the metaproteome is associated with those functions? And, which microorganisms are expressing the proteins that perform those functions? However, traditional protein-centric functional analysis is greatly complicated by the large size, redundancy, and lack of biological annotations for the protein sequences in the database used to search the data. To help address these issues, we have developed an algorithm and web application (dubbed “MetaGOmics”) that automates the quantitative functional (using Gene Ontology) and taxonomic analysis of metaproteomics data and subsequent visualization of the results. MetaGOmics is designed to overcome the shortcomings of traditional proteomics analysis when used with metaproteomics data. It is easy to use, requires minimal input, and fully automates most steps of the analysis—including comparing the functional makeup between samples. MetaGOmics is freely available at https://www.yeastrc.org/metagomics/.

[1]  Michael Riffle,et al.  JobCenter: an open source, cross-platform, and distributed job queue management system optimized for scalability and versatility , 2011, Source Code for Biology and Medicine.

[2]  D. Tabb,et al.  Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. , 2007, Journal of proteome research.

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  Martin Eisenacher,et al.  In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics. , 2017, Journal of proteomics.

[5]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[6]  J. Koziol,et al.  Label-free, normalized quantification of complex mass spectrometry data for proteomics analysis , 2009, Nature Biotechnology.

[7]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[8]  David R Goodlett,et al.  Comparative metaproteomics reveals ocean-scale shifts in microbial nutrient utilization and energy transduction , 2010, The ISME Journal.

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  Lu Wang,et al.  The NIH Human Microbiome Project. , 2009, Genome research.

[11]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[12]  Eran Elinav,et al.  Use of Metatranscriptomics in Microbiome Research , 2016, Bioinformatics and biology insights.

[13]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[14]  Quanhu Sheng,et al.  A Bayesian Approach to Protein Inference Problem in Shotgun Proteomics , 2008, RECOMB.

[15]  Oliver A.H. Jones,et al.  Metabolomics and its use in ecology , 2013 .

[16]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[17]  Liyuan Chen,et al.  Bioinformatics Analysis of Protein Secretion in Plants. , 2017, Methods in molecular biology.

[18]  Georgios A. Pavlopoulos,et al.  Metagenomics: Tools and Insights for Analyzing Next-Generation Sequencing Data Derived from Biodiversity Studies , 2015, Bioinformatics and biology insights.

[19]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[20]  M. Washburn,et al.  Refinements to label free proteome quantitation: how to deal with peptides shared by multiple proteins. , 2010, Analytical chemistry.

[21]  R. Heyer,et al.  The MetaProteomeAnalyzer: a powerful open-source software suite for metaproteomics data analysis and interpretation. , 2015, Journal of proteome research.

[22]  Maria Jesus Martin,et al.  High-quality Protein Knowledge Resource: SWISS-PROT and TrEMBL , 2002, Briefings Bioinform..

[23]  M. Washburn,et al.  Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors , 2006, Proceedings of the National Academy of Sciences.

[24]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[25]  Octávio L. Franco,et al.  Metaproteomics as a Complementary Approach to Gut Microbiota in Health and Disease , 2017, Front. Chem..

[26]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[27]  Cathy H. Wu,et al.  Protein Bioinformatics Databases and Resources. , 2017, Methods in molecular biology.

[28]  T. Muth,et al.  The impact of sequence database choice on metaproteomic results in gut microbiota studies , 2016, Microbiome.

[29]  Lukas Käll,et al.  Recognizing uncertainty increases robustness and reproducibility of mass spectrometry-based protein inferences. , 2012, Journal of proteome research.

[30]  N. Segata,et al.  Shotgun metagenomics, from sampling to analysis , 2017, Nature Biotechnology.

[31]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[32]  M. Mann,et al.  What does it mean to identify a protein in proteomics? , 2002, Trends in biochemical sciences.

[33]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[34]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[35]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[36]  William Stafford Noble,et al.  Critical decisions in metaproteomics: achieving high confidence protein annotations in a sea of unknowns , 2016, The ISME Journal.

[37]  Peter Dawyndt,et al.  Unipept: tryptic peptide-based biodiversity analysis of metaproteome samples. , 2012, Journal of proteome research.

[38]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[39]  Luis Pedro Coelho,et al.  Structure and function of the global ocean microbiome , 2015, Science.

[40]  William Stafford Noble,et al.  An Alignment-Free "Metapeptide" Strategy for Metaproteomic Characterization of Microbiome Samples Using Shotgun Metagenomic Sequencing. , 2016, Journal of proteome research.

[41]  R. Pal,et al.  Send Orders of Reprints at Reprints@benthamscience.net Integrated Analysis of Transcriptomic and Proteomic Data , 2022 .

[42]  Luis Serrano,et al.  Correlation of mRNA and protein in complex biological samples , 2009, FEBS letters.

[43]  M. Mann,et al.  Exponentially Modified Protein Abundance Index (emPAI) for Estimation of Absolute Protein Amount in Proteomics by the Number of Sequenced Peptides per Protein*S , 2005, Molecular & Cellular Proteomics.