Evaluating the Impact of Different Sequence Databases on Metaproteome Analysis: Insights from a Lab-Assembled Microbial Mixture

Metaproteomics enables the investigation of the protein repertoire expressed by complex microbial communities. However, to unleash its full potential, refinements in bioinformatic approaches for data analysis are still needed. In this context, sequence databases selection represents a major challenge. This work assessed the impact of different databases in metaproteomic investigations by using a mock microbial mixture including nine diverse bacterial and eukaryotic species, which was subjected to shotgun metaproteomic analysis. Then, both the microbial mixture and the single microorganisms were subjected to next generation sequencing to obtain experimental metagenomic- and genomic-derived databases, which were used along with public databases (namely, NCBI, UniProtKB/SwissProt and UniProtKB/TrEMBL, parsed at different taxonomic levels) to analyze the metaproteomic dataset. First, a quantitative comparison in terms of number and overlap of peptide identifications was carried out among all databases. As a result, only 35% of peptides were common to all database classes; moreover, genus/species-specific databases provided up to 17% more identifications compared to databases with generic taxonomy, while the metagenomic database enabled a slight increment in respect to public databases. Then, database behavior in terms of false discovery rate and peptide degeneracy was critically evaluated. Public databases with generic taxonomy exhibited a markedly different trend compared to the counterparts. Finally, the reliability of taxonomic attribution according to the lowest common ancestor approach (using MEGAN and Unipept software) was assessed. The level of misassignments varied among the different databases, and specific thresholds based on the number of taxon-specific peptides were established to minimize false positives. This study confirms that database selection has a significant impact in metaproteomics, and provides critical indications for improving depth and reliability of metaproteomic results. Specifically, the use of iterative searches and of suitable filters for taxonomic assignments is proposed with the aim of increasing coverage and trustworthiness of metaproteomic data.

[1]  Eoin L. Brodie,et al.  Direct cellular lysis/protein extraction protocol for soil metaproteomics. , 2010, Journal of proteome research.

[2]  M. Ferrer,et al.  Metagenomics approaches in systems microbiology. , 2009, FEMS microbiology reviews.

[3]  D Raoult,et al.  Microbial culturomics: paradigm shift in the human gut microbiome study. , 2012, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[4]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[5]  Suparna Mitra,et al.  Introduction to the analysis of environmental sequences: metagenomics with MEGAN. , 2012, Methods in molecular biology.

[6]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[7]  B. Garcia,et al.  Proteomics , 2011, Journal of biomedicine & biotechnology.

[8]  M. Claassen Inference and Validation of Protein Identifications , 2012, Molecular & Cellular Proteomics.

[9]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[10]  J. Suflita,et al.  Microbial interactions during residual oil and n-fatty acid metabolism by a methanogenic consortium. , 2012, Environmental microbiology reports.

[11]  Zengyou He,et al.  Protein inference: a review , 2012, Briefings Bioinform..

[12]  J. Handelsman,et al.  Metagenomics: genomic analysis of microbial communities. , 2004, Annual review of genetics.

[13]  Joel A. Kooren,et al.  A two‐step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies , 2013, Proteomics.

[14]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[15]  Samir V. Deshpande,et al.  A protein processing filter method for bacterial identification by mass spectrometry-based proteomics. , 2011, Journal of proteome research.

[16]  Paul Wilmes,et al.  Metaproteomics: studying functional gene expression in microbial ecosystems. , 2006, Trends in microbiology.

[17]  Timothy J Griffin,et al.  Deep metaproteomic analysis of human salivary supernatant , 2012, Proteomics.

[18]  Brandi L. Cantarel,et al.  Strategies for Metagenomic-Guided Whole-Community Proteomics of Complex Microbial Environments , 2011, PloS one.

[19]  Hauke Smidt,et al.  The function of our microbiota: who is out there and what do they do? , 2012, Front. Cell. Inf. Microbio..

[20]  W. Lehmann,et al.  De novo sequencing of peptides by MS/MS , 2010, Proteomics.

[21]  M. Ferrer,et al.  Microbiota from the distal guts of lean and obese adolescents exhibit partial functional redundancy besides clear differences in community structure. , 2013, Environmental microbiology.

[22]  A. Tanca,et al.  Comparison of detergent‐based sample preparation workflows for LTQ‐Orbitrap analysis of the Escherichia coli proteome , 2013, Proteomics.

[23]  M. Blaser,et al.  The human microbiome: at the interface of health and disease , 2012, Nature Reviews Genetics.

[24]  B. Roschitzki,et al.  Community proteogenomics reveals insights into the physiology of phyllosphere bacteria , 2009, Proceedings of the National Academy of Sciences.

[25]  G. Huffnagle,et al.  The emerging world of the fungal microbiome. , 2013, Trends in microbiology.

[26]  Aaron Marc Saunders,et al.  Microbial communities involved in enhanced biological phosphorus removal from wastewater--a model system in environmental biotechnology. , 2012, Current opinion in biotechnology.

[27]  D. Benndorf,et al.  Searching for a needle in a stack of needles: challenges in metaproteomics data analysis. , 2013, Molecular bioSystems.

[28]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[29]  Vincent J. Denef,et al.  Proteomics-inferred genome typing (PIGT) demonstrates inter-population recombination as a strategy for environmental adaptation. , 2009, Environmental microbiology.

[30]  P. Bork,et al.  Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. , 2001, Analytical chemistry.

[31]  Shivakumar Keerthikumar,et al.  Proteogenomic analysis of Candida glabrata using high resolution mass spectrometry. , 2012, Journal of proteome research.

[32]  J. Doré,et al.  An iterative workflow for mining the human intestinal metaproteome , 2011, BMC Genomics.

[33]  L. Hood,et al.  Tackling the Microbiome , 2012, Science.

[34]  Adam Godzik,et al.  Shotgun metaproteomics of the human distal gut microbiota , 2008, The ISME Journal.

[35]  T. Griffin,et al.  A metaproteomic analysis of the human salivary microbiota by three-dimensional peptide fractionation and tandem mass spectrometry. , 2010, Molecular oral microbiology.

[36]  Francisco J. Planes,et al.  Bioinformatic progress and applications in metaproteogenomics for bridging the gap between genomic sequences and metabolic functions in microbial communities , 2013, Proteomics.

[37]  R. Hettich,et al.  Microbial metaproteomics: identifying the repertoire of proteins that microorganisms use to compete and cooperate in complex environmental communities. , 2012, Current opinion in microbiology.

[38]  C. Kolmeder,et al.  Metaproteomics of our microbiome - developing insight in function and activity in man and model systems. , 2014, Journal of proteomics.

[39]  V. Canzonieri,et al.  Application of 2D‐DIGE to formalin‐fixed diseased tissue samples from hospital repositories: Results from four case studies , 2013, Proteomics. Clinical applications.

[40]  Charles Buck,et al.  Performance evaluation of existing de novo sequencing algorithms. , 2006, Journal of proteome research.

[41]  M. Dubinsky,et al.  Interactions Between Commensal Fungi and the C-Type Lectin Receptor Dectin-1 Influence Colitis , 2012, Science.

[42]  S. Hubbard,et al.  Addressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies , 2012, Journal of proteome research.

[43]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[44]  R. Zahedi,et al.  Peptide identification quality control , 2011, Proteomics.

[45]  R. Hettich,et al.  Coupling a detergent lysis/cleanup methodology with intact protein fractionation for enhanced proteome characterization. , 2012, Journal of proteome research.

[46]  J. Buhmann,et al.  Generic Comparison of Protein Inference Engines* , 2011, Molecular & Cellular Proteomics.

[47]  Jack Gilbert,et al.  Modeling microbial communities: current, developing, and future technologies for predicting microbial community interaction. , 2012, Journal of biotechnology.

[48]  David R Goodlett,et al.  Comparative metaproteomics reveals ocean-scale shifts in microbial nutrient utilization and energy transduction , 2010, The ISME Journal.

[49]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[50]  S. Schuster,et al.  Integrative analysis of environmental sequences using MEGAN4. , 2011, Genome research.

[51]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[52]  Brandi L. Cantarel,et al.  Integrated Metagenomics/Metaproteomics Reveals Human Host-Microbiota Signatures of Crohn's Disease , 2012, PloS one.

[53]  Lennart Martens,et al.  Analysis of the resolution limitations of peptide identification algorithms. , 2011, Journal of proteome research.

[54]  S. Harju,et al.  Rapid isolation of yeast genomic DNA: Bust n' Grab , 2004, BMC biotechnology.

[55]  C. R. Stensvold,et al.  Waiting for the human intestinal Eukaryotome , 2013, The ISME Journal.

[56]  R. Moritz,et al.  Current algorithmic solutions for peptide-based proteomics data generation and identification. , 2013, Current opinion in biotechnology.

[57]  Č. Novotný,et al.  Interspecific interactions in mixed microbial cultures in a biodegradation perspective , 2012, Applied Microbiology and Biotechnology.

[58]  Vincent J. Denef,et al.  Systems Biology: Functional analysis of natural microbial consortia using community proteomics , 2009, Nature Reviews Microbiology.

[59]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[60]  M. Parsek,et al.  Going local: technologies for exploring bacterial microenvironments , 2013, Nature Reviews Microbiology.

[61]  R. J. Beynon,et al.  Cross Species Proteomics , 2010, Proteome Bioinformatics.

[62]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[63]  F. Abram,et al.  Exploring mixed microbial community functioning: recent advances in metaproteomics , 2012, FEMS microbiology ecology.

[64]  William Stafford Noble,et al.  Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets. , 2009, Journal of proteome research.

[65]  J. Armengaud,et al.  Proteogenomics for environmental microbiology , 2013, Proteomics.

[66]  S. Giovannoni,et al.  Metagenomics: microbial diversity through a scratched lens. , 2012, Current opinion in microbiology.

[67]  Peter Dawyndt,et al.  Unipept: tryptic peptide-based biodiversity analysis of metaproteome samples. , 2012, Journal of proteome research.

[68]  Hanno Steen,et al.  Estimating the confidence of peptide identifications without decoy databases. , 2010, Analytical chemistry.

[69]  C. Eyers Universal sample preparation method for proteome analysis , 2009 .

[70]  M. Ferrer,et al.  Systems approaches to microbial communities and their functioning. , 2010, Current opinion in biotechnology.

[71]  L. Käll,et al.  Quality assessments of peptide–spectrum matches in shotgun proteomics , 2011, Proteomics.

[72]  J. Banfield,et al.  Community Proteomics of a Natural Microbial Biofilm , 2005, Science.

[73]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[74]  Rob Knight,et al.  Our microbial selves: what ecology can teach us , 2011, EMBO reports.