Strategies for Metagenomic-Guided Whole-Community Proteomics of Complex Microbial Environments

Accurate protein identification in large-scale proteomics experiments relies upon a detailed, accurate protein catalogue, which is derived from predictions of open reading frames based on genome sequence data. Integration of mass spectrometry-based proteomics data with computational proteome predictions from environmental metagenomic sequences has been challenging because of the variable overlap between proteomic datasets and corresponding short-read nucleotide sequence data. In this study, we have benchmarked several strategies for increasing microbial peptide spectral matching in metaproteomic datasets using protein predictions generated from matched metagenomic sequences from the same human fecal samples. Additionally, we investigated the impact of mass spectrometry-based filters (high mass accuracy, delta correlation), and de novo peptide sequencing on the number and robustness of peptide-spectrum assignments in these complex datasets. In summary, we find that high mass accuracy peptide measurements searched against non-assembled reads from DNA sequencing of the same samples significantly increased identifiable proteins without sacrificing accuracy.

[1]  Joshua E. Elias,et al.  Assessing Enzyme Activities Using Stable Isotope Labeling and Mass Spectrometry *S , 2007, Molecular & Cellular Proteomics.

[2]  T. Takagi,et al.  MetaGene: prokaryotic gene finding from environmental genome shotgun sequences , 2006, Nucleic acids research.

[3]  C. Quince,et al.  Accurate determination of microbial diversity from 454 pyrosequencing data , 2009, Nature Methods.

[4]  Stephen J. Callister,et al.  Analysis of biostimulated microbial communities from two field experiments reveals temporal and spatial differences in proteome profiles. , 2010, Environmental science & technology.

[5]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[6]  T. Itoh,et al.  MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[7]  Michael J MacCoss,et al.  Comparison of database search strategies for high precursor mass accuracy MS/MS data. , 2010, Journal of proteome research.

[8]  B. Erickson,et al.  Experimental approach for deep proteome measurements from small-scale microbial biomass samples. , 2008, Analytical chemistry.

[9]  Samuel H. Payne,et al.  Discovery and revision of Arabidopsis genes by proteogenomics , 2008, Proceedings of the National Academy of Sciences.

[10]  Damian Fermin,et al.  Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics , 2006, Genome Biology.

[11]  Mihai Pop,et al.  Microbiome Metagenomic Analysis of the Human Distal Gut , 2009 .

[12]  F. Bäckhed,et al.  Host-Bacterial Mutualism in the Human Intestine , 2005, Science.

[13]  P. Pevzner,et al.  Spectral Dictionaries , 2009, Molecular & Cellular Proteomics.

[14]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[15]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[16]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[17]  J. Banfield,et al.  Community Proteomics of a Natural Microbial Biofilm , 2005, Science.

[18]  Manesh Shah,et al.  Environmental proteomics of microbial plankton in a highly productive coastal upwelling system , 2011, The ISME Journal.

[19]  M. Borodovsky,et al.  Ab initio gene identification in metagenomic sequences , 2010, Nucleic acids research.

[20]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[21]  M. Pop,et al.  Metagenomic Analysis of the Human Distal Gut Microbiome , 2006, Science.

[22]  Adam Godzik,et al.  Shotgun metaproteomics of the human distal gut microbiota , 2008, The ISME Journal.

[23]  Andreas Graner,et al.  454 sequencing put to the test using the complex genome of barley , 2006, BMC Genomics.

[24]  M. Mann,et al.  Trypsin Cleaves Exclusively C-terminal to Arginine and Lysine Residues*S , 2004, Molecular & Cellular Proteomics.

[25]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[26]  Hiroshi Mori,et al.  Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes , 2007, DNA research : an international journal for rapid publication of reports on genes and genomes.

[27]  Eoin L. Brodie,et al.  Direct cellular lysis/protein extraction protocol for soil metaproteomics. , 2010, Journal of proteome research.

[28]  Kenneth H. Williams,et al.  Proteogenomic Monitoring of Geobacter Physiology during Stimulated Uranium Bioremediation , 2009, Applied and Environmental Microbiology.

[29]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[30]  David L Tabb,et al.  Determination and comparison of the baseline proteomes of the versatile microbe Rhodopseudomonas palustris under its major metabolic states. , 2006, Journal of proteome research.

[31]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[32]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[33]  Katharina J. Hoff,et al.  Orphelia: predicting genes in metagenomic sequencing reads , 2009, Nucleic Acids Res..

[34]  Vincent J. Denef,et al.  Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria , 2007, Nature.

[35]  Aaron J Mackey,et al.  Getting More from Less , 2002, Molecular & Cellular Proteomics.

[36]  Paul Wilmes,et al.  Metaproteomics Provides Functional Insight into Activated Sludge Wastewater Treatment , 2008, PloS one.

[37]  J. Yates,et al.  DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. , 2002, Journal of proteome research.

[38]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[39]  M. Ronaghi,et al.  A Sequencing Method Based on Real-Time Pyrophosphate , 1998, Science.

[40]  J. Jansson,et al.  Molecular analysis of the gut microbiota of identical twins with Crohn's disease , 2008, The ISME Journal.