Using high abundance proteins as guides for fast and effective peptide/protein identification from metaproteomic data

Background A few recent large efforts significantly expanded the collection of human-associated bacterial genomes, which now contains thousands of entities including reference complete/draft genomes and metagenome assembled genomes (MAGs). These genomes provide useful resource for studying the functionality of the human-associated microbiome and their relationship with human health and diseases. One application of these genomes is to provide a universal reference for database search in metaproteomic studies, when matched metagenomic/metatranscriptomic data are unavailable. However, a greater collection of reference genomes may not necessarily result in better peptide/protein identification because the increase of search space often leads to fewer spectrum-peptide matches, not to mention the drastic increase of computation time. Methods Here, we present a new approach that uses two steps to optimize the use of the reference genomes and MAGs as the universal reference for human gut metaproteomic MS/MS data analysis. The first step is to use only the High Abundance Proteins (HAPs) (i.e., ribosomal proteins and elongation factors) for metaproteomic MS/MS database search and, based on the identification results, to derive the taxonomic composition of the underlying microbial community. The second step is to expand the search database by including all proteins from identified abundant species. We call our approach HAPiID (HAPs guided metaproteomics IDentification). Results We tested our approach using human gut metaproteomic datasets from a previous study and compared it to the state-of-the-art reference database search method MetaPro-IQ for metaproteomic identification in studying human gut microbiota. Our results show that our two-steps method not only performed significantly faster but also was able to identify more peptides. We further demonstrated the application of HAPiID to revealing protein profiles of individual human-associated bacterial species, one or a few species at a time, using metaproteomic data. Conclusions The HAP guided profiling approach presents a novel effective way for constructing target database for metaproteomic data analysis. The HAPiID pipeline built upon this approach provides a universal tool for analyzing human gut-associated metaproteomic data.

[1]  Joel A. Kooren,et al.  A two‐step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies , 2013, Proteomics.

[2]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[3]  Robert D. Finn,et al.  HMMER web server: 2015 update , 2015, Nucleic Acids Res..

[4]  Robert Heyer,et al.  Challenges and perspectives of metaproteomic data analysis. , 2017, Journal of biotechnology.

[5]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[6]  M. Pop,et al.  Metagenomic Analysis of the Human Distal Gut Microbiome , 2006, Science.

[7]  Zhibin Ning,et al.  MetaLab: an automated pipeline for metaproteomic data analysis , 2017, Microbiome.

[8]  Chenhong Zhang,et al.  Gut bacteria selectively promoted by dietary fibers alleviate type 2 diabetes , 2018, Science.

[9]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[10]  J. Clemente,et al.  The Impact of the Gut Microbiota on Human Health: An Integrative View , 2012, Cell.

[11]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[12]  William Stafford Noble,et al.  Critical decisions in metaproteomics: achieving high confidence protein annotations in a sea of unknowns , 2016, The ISME Journal.

[13]  B. Henrissat,et al.  Archaea: Essential inhabitants of the human digestive microbiota , 2017 .

[14]  P. Turnbaugh,et al.  Microbial ecology: Human gut microbes associated with obesity , 2006, Nature.

[15]  Scott T. Bates,et al.  Cross-biome metagenomic analyses of soil microbial communities and their functional attributes , 2012, Proceedings of the National Academy of Sciences.

[16]  Ian D. Wilson,et al.  Gut microbiota modulation of chemotherapy efficacy and toxicity , 2017, Nature Reviews Gastroenterology &Hepatology.

[17]  Donovan H. Parks,et al.  A proposal for a standardized bacterial taxonomy based on genome phylogeny , 2018, bioRxiv.

[18]  Mitchell H. Murdock,et al.  The microbiota regulate neuronal function and fear extinction learning , 2019, Nature.

[19]  Haifeng Lu,et al.  Symbiotic gut microbes modulate human metabolic phenotypes , 2008, Proceedings of the National Academy of Sciences.

[20]  Anders F. Andersson,et al.  Community proteogenomics highlights microbial strain-variant protein expression within activated sludge performing enhanced biological phosphorus removal , 2008, The ISME Journal.

[21]  W. D. de Vos,et al.  Comparative Metaproteomics and Diversity Analysis of Human Intestinal Microbiota Testifies for Its Temporal Stability and Expression of Core Functions , 2012, PloS one.

[22]  Tao Cai,et al.  Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary , 2005, Bioinform..

[23]  Jüergen Cox,et al.  The MaxQuant computational platform for mass spectrometry-based shotgun proteomics , 2016, Nature Protocols.

[24]  Brandi L. Cantarel,et al.  Integrated Metagenomics/Metaproteomics Reveals Human Host-Microbiota Signatures of Crohn's Disease , 2012, PloS one.

[25]  Robert D. Finn,et al.  A new genomic blueprint of the human gut microbiota , 2019, Nature.

[26]  James Butcher,et al.  MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota , 2016, Microbiome.

[27]  Guanghui Wang,et al.  Decoy methods for assessing false positives and false discovery rates in shotgun proteomics. , 2009, Analytical chemistry.

[28]  J. Nicholson,et al.  Host-Gut Microbiota Metabolic Interactions , 2012, Science.

[29]  Massimo Deligios,et al.  Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture , 2013, PloS one.

[30]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[31]  Haixu Tang,et al.  A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics , 2016, PLoS Comput. Biol..

[32]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[33]  Adam Godzik,et al.  Shotgun metaproteomics of the human distal gut microbiota , 2008, The ISME Journal.

[34]  M. Dunn,et al.  A human gut bacterial genome and culture collection for improved metagenomic analyses , 2019, Nature Biotechnology.

[35]  D. Lacy,et al.  Toward a structural understanding of Clostridium difficile toxins A and B , 2012, Front. Cell. Inf. Microbio..

[36]  M. Bull,et al.  Part 1: The Human Gut Microbiome in Health and Disease. , 2014, Integrative medicine.

[37]  Lei Wang,et al.  msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing. , 2018, Journal of proteome research.

[38]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[39]  R. Beavis,et al.  A method for reducing the time required to match protein sequences with tandem mass spectra. , 2003, Rapid communications in mass spectrometry : RCM.

[40]  P. Cotter,et al.  Role of the gut microbiota in health and chronic gastrointestinal disease: understanding a hidden metabolic organ , 2013, Therapeutic advances in gastroenterology.

[41]  Christophe Caron,et al.  Towards the human intestinal microbiota phylogenetic core. , 2009, Environmental microbiology.

[42]  Philip D. Blood,et al.  Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software , 2017, Nature Methods.

[43]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[44]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[45]  A. Scarano,et al.  Effect of probiotics on the occurrence of nutrition absorption capacities in healthy children: a randomized double-blinded placebo-controlled pilot study. , 2019, European review for medical and pharmacological sciences.

[46]  Laurence Zitvogel,et al.  Gut microbiome influences efficacy of PD-1–based immunotherapy against epithelial tumors , 2018, Science.

[47]  Paul Wilmes,et al.  Metaproteomics: studying functional gene expression in microbial ecosystems. , 2006, Trends in microbiology.

[48]  Donovan H. Parks,et al.  A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life , 2018, Nature Biotechnology.

[49]  Elhanan Borenstein,et al.  Human and Extracellular DNA Depletion for Metagenomic Analysis of Complex Clinical Infection Samples Yields Optimized Viable Microbiome Profiles , 2019, Cell reports.

[50]  H. Flint,et al.  Phylogenetic Relationships of Butyrate-Producing Bacteria from the Human Gut , 2000, Applied and Environmental Microbiology.

[51]  Rob Knight,et al.  Comparative metagenomic, phylogenetic and physiological analyses of soil microbial communities across nitrogen gradients , 2011, The ISME Journal.

[52]  Hiroshi Mori,et al.  Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes , 2007, DNA research : an international journal for rapid publication of reports on genes and genomes.

[53]  Vineet Bafna,et al.  ProteoStorm: An Ultrafast Metaproteomics Database Search Framework. , 2018, Cell systems.

[54]  Johannes Griss,et al.  PRIDE Cluster: building a consensus of proteomics data , 2013, Nature Methods.

[55]  Hiroyuki Ogata,et al.  KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold , 2019, bioRxiv.

[56]  Haixu Tang,et al.  A Meta-proteogenomic Approach to Peptide Identification Incorporating Assembly Uncertainty and Genomic Variation* , 2019, Molecular & Cellular Proteomics.