pepFunk: a tool for peptide-centric functional analysis of metaproteomic human gut microbiome studies.

MOTIVATION Enzymatic digestion of proteins before mass spectrometry analysis is a key process in metaproteomic workflows. Canonical metaproteomic data processing pipelines typically involve matching spectra produced by the mass spectrometer to a theoretical spectra database, followed by matching the identified peptides back to parent proteins. However, the nature of enzymatic digestion produces peptides that can be found in multiple proteins due to conservation or chance, presenting difficulties with protein and functional assignment. RESULTS To combat this challenge, we developed pepFunk, a peptide-centric metaproteomic workflow focused on the analysis of human gut microbiome samples. Our workflow includes a curated peptide database annotated with KEGG terms and a gene set variation analysis-inspired pathway enrichment adapted for peptide level data. Analysis using our peptide-centric workflow is fast and highly correlated to a protein-centric analysis, and can identify more enriched KEGG pathways than analysis using protein-level data. Our workflow is open source and available as a web application or source code to be run locally. AVAILABILITY AND IMPLEMENTATION pepFunk is available online as a web application at https://shiny.imetalab.ca/pepFunk/ with open source code available from https://github.com/northomics/pepFunk. SUPPLEMENTARY INFORMATION Supplementary figures follow the manuscript and the peptide to KEGG database is submitted as a supplementary file.

[1]  William Stafford Noble,et al.  A review of statistical methods for protein identification using tandem mass spectrometry. , 2012, Statistics and its interface.

[2]  Xu Zhang,et al.  iMetaLab 1.0: a web platform for metaproteomics data analysis , 2018, Bioinform..

[3]  Martin Eisenacher,et al.  The PRIDE database and related tools and resources in 2019: improving support for quantification data , 2018, Nucleic Acids Res..

[4]  S. Hazen,et al.  Gut Microbiota in Cardiovascular Health and Disease , 2017, Circulation research.

[5]  Juan Antonio Vizcaíno,et al.  The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition , 2016, Nucleic Acids Res..

[6]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[7]  Janet K. Jansson,et al.  Twin studies reveal specific imbalances in the mucosa‐associated microbiota of patients with ileal Crohn's disease , 2009, Inflammatory bowel diseases.

[8]  F. Bäckhed,et al.  Diet–microbiota interactions as moderators of human metabolism , 2016, Nature.

[9]  Michael Berk,et al.  The gut microbiome and diet in psychiatry: focus on depression , 2015, Current opinion in psychiatry.

[10]  D. Sterner,et al.  Acetylation of Histones and Transcription-Related Factors , 2000, Microbiology and Molecular Biology Reviews.

[11]  Jacobo de la Cuesta-Zuluaga,et al.  Metformin Is Associated With Higher Relative Abundance of Mucin-Degrading Akkermansia muciniphila and Several Short-Chain Fatty Acid–Producing Microbiota in the Gut , 2016, Diabetes Care.

[12]  Jens Roat Kultima,et al.  An integrated catalog of reference genes in the human gut microbiome , 2014, Nature Biotechnology.

[13]  Chongle Pan,et al.  Metaproteomics: harnessing the power of high performance mass spectrometry to identify the suite of proteins that control metabolic activities in microbial communities. , 2013, Analytical chemistry.

[14]  Tal Galili,et al.  dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering , 2015, Bioinform..

[15]  Qinghua Cui,et al.  Metformin Alters Gut Microbiota of Healthy Mice: Implication for Its Potential Role in Gut Microbiota Homeostasis , 2018, Front. Microbiol..

[16]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[17]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[18]  R. Heyer,et al.  The MetaProteomeAnalyzer: a powerful open-source software suite for metaproteomics data analysis and interpretation. , 2015, Journal of proteome research.

[19]  Uwe Sauer,et al.  Protein acetylation affects acetate metabolism, motility and acid stress response in Escherichia coli , 2014, Molecular systems biology.

[20]  Brandi L Cantarel,et al.  Metaproteomics reveals persistent and phylum-redundant metabolic functional stability in adult human gut microbiomes of Crohn’s remission patients despite temporal variations in microbial taxa, genomes, and proteomes , 2019, Microbiome.

[21]  David Weinkove,et al.  Metformin Retards Aging in C. elegans by Altering Microbial Folate and Methionine Metabolism , 2013, Cell.

[22]  Laura M Cox,et al.  Alterations of the human gut microbiome in multiple sclerosis , 2016, Nature Communications.

[23]  Zhibin Ning,et al.  MetaLab: an automated pipeline for metaproteomic data analysis , 2017, Microbiome.

[24]  Lisa M Bramer,et al.  Dynamics of the human gut microbiome in Inflammatory Bowel Disease , 2017, Nature Microbiology.

[25]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[26]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[27]  William Stafford Noble,et al.  MetaGOmics: A Web-Based Tool for Peptide-Centric Functional and Taxonomic Analysis of Metaproteomics Data , 2017, Proteomes.

[28]  Hadley Wickham,et al.  Reshaping Data with the reshape Package , 2007 .

[29]  J. Clemente,et al.  Human gut microbiome viewed across age and geography , 2012, Nature.

[30]  Harald R. Gruber-Vodicka,et al.  Environmental Breviatea harbor mutualistic Arcobacter epibionts , 2016, Nature.

[31]  T. Muth,et al.  The impact of sequence database choice on metaproteomic results in gut microbiota studies , 2016, Microbiome.

[32]  Tobias Kollmann,et al.  Early infancy microbial and metabolic alterations affect risk of childhood asthma , 2015, Science Translational Medicine.

[33]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[34]  G. Smyth,et al.  ROBUST HYPERPARAMETER ESTIMATION PROTECTS AGAINST HYPERVARIABLE GENES AND IMPROVES POWER TO DETECT DIFFERENTIAL EXPRESSION. , 2016, The annals of applied statistics.

[35]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[36]  John R Yates,et al.  Using PSEA‐Quant for Protein Set Enrichment Analysis of Quantitative Mass Spectrometry‐Based Proteomics , 2016, Current protocols in bioinformatics.

[37]  D. Figeys,et al.  Peptide-Centric Approaches Provide an Alternative Perspective To Re-Examine Quantitative Proteomic Data. , 2016, Analytical chemistry.

[38]  William Stafford Noble,et al.  Metaproteomics reveal that rapid perturbations in organic matter prioritize functional restructuring over taxonomy in western Arctic Ocean microbiomes , 2019, The ISME Journal.

[39]  Daniel B. McClatchy,et al.  PSEA-Quant: A Protein Set Enrichment Analysis on Label-Free and Label-Based Protein Quantification Data , 2014, Journal of proteome research.

[40]  Lennart Martens,et al.  Unipept 4.0: Functional Analysis of Metaproteome Data. , 2018, Journal of proteome research.

[41]  Zhibin Ning,et al.  Separation and characterization of human microbiomes by metaproteomics , 2018, TrAC Trends in Analytical Chemistry.

[42]  W. D. de Vos,et al.  Faecal and Serum Metabolomics in Paediatric Inflammatory Bowel Disease , 2016, Journal of Crohn's & colitis.

[43]  Nico Jehmlich,et al.  Using proteins to study how microbes contribute to soil ecosystem services: The current state and future perspectives of soil metaproteomics. , 2019, Journal of proteomics.

[44]  Zhibin Ning,et al.  Assessing the impact of protein extraction methods for human gut metaproteomics. , 2017, Journal of proteomics.

[45]  Gregory S Stupp,et al.  Metaproteomics of colonic microbiota unveils discrete protein functions among colitic mice and control groups , 2017, bioRxiv.

[46]  Timothy L. Tickle,et al.  Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment , 2012, Genome Biology.

[47]  Justin Guinney,et al.  GSVA: gene set variation analysis for microarray and RNA-Seq data , 2013, BMC Bioinformatics.

[48]  D. Figeys,et al.  An in vitro model maintaining taxon-specific functional activities of the gut microbiome , 2019, Nature Communications.

[49]  Shawn W. Polson,et al.  Evaluation of a Transposase Protocol for Rapid Generation of Shotgun High-Throughput Sequencing Libraries from Nanogram Quantities of DNA , 2011, Applied and Environmental Microbiology.

[50]  James Butcher,et al.  Metaproteomics reveals associations between microbiome and intestinal extracellular vesicle proteins in pediatric inflammatory bowel disease , 2018, Nature Communications.

[51]  Salvador Martínez-Bartolomé,et al.  From Raw Data to Biological Discoveries: A Computational Analysis Pipeline for Mass Spectrometry-Based Proteomics , 2015, Journal of The American Society for Mass Spectrometry.

[52]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[53]  Alessandra Carbone,et al.  A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling , 2018, Microbiome.