Quantitative proteogenomics of human pathogens using DIA-MS.

The increasing number of bacterial genomes in combination with reproducible quantitative proteome measurements provides new opportunities to explore how genetic differences modulate proteome composition and virulence. It is challenging to combine genome and proteome data as the underlying genome influences the proteome. We present a strategy to facilitate the integration of genome data from several genetically similar bacterial strains with data-independent analysis mass spectrometry (DIA-MS) for rapid interrogation of the combined data sets. The strategy relies on the construction of a composite genome combining all genetic data in a compact format, which can accommodate the fusion with quantitative peptide and protein information determined via DIA-MS. We demonstrate the method by combining data sets from whole genome sequencing, shotgun MS and DIA-MS from 34 clinical isolates of Streptococcus pyogenes. The data structure allows for fast exploration of the data showing that undetected proteins are on average more amenable to amino acid substitution than expressed proteins. We identified several significantly differentially expressed proteins between invasive and non-invasive strains. The work underlines how integration of whole genome sequencing with accurately quantified proteomes can further advance the interpretation of the relationship between genomes, proteomes and virulence. This article is part of a Special Issue entitled: Computational Proteomics.

[1]  T. Mitchell,et al.  The pathogenesis of streptococcal infections: from Tooth decay to meningitis , 2003, Nature Reviews Microbiology.

[2]  Steven Salzberg,et al.  Mugsy: fast multiple alignment of closely related whole genomes , 2010, Bioinform..

[3]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[4]  M. Sternberg,et al.  The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions. , 2013, Journal of molecular biology.

[5]  Andreas Quandt,et al.  Streptococcus pyogenes in Human Plasma , 2011, The Journal of Biological Chemistry.

[6]  R. Olsen,et al.  Polymorphisms in regulator of protease B (RopB) alter disease phenotype and strain virulence of serotype M3 Group A Streptococcus , 2012, The Journal of infectious diseases.

[7]  Eric W. Deutsch,et al.  A repository of assays to quantify 10,000 human proteins by SWATH-MS , 2014, Scientific Data.

[8]  Ben C. Collins,et al.  OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data , 2014, Nature Biotechnology.

[9]  Bernd Rinn,et al.  openBIS: a flexible framework for managing and analyzing complex data in biology research , 2011, BMC Bioinformatics.

[10]  D. Goodlett,et al.  Multiplexed and data-independent tandem mass spectrometry for global proteome profiling. , 2014, Mass spectrometry reviews.

[11]  Thomas M. Keane,et al.  ABACAS: algorithm-based automatic contiguation of assembled sequences , 2009, Bioinform..

[12]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[13]  J. Carapetis,et al.  Global emm type distribution of group A streptococci: systematic review and implications for vaccine development. , 2009, The Lancet. Infectious diseases.

[14]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[15]  M. Gorenstein,et al.  The detection, correlation, and comparison of peptide precursor and product ions from data independent LC‐MS with data dependant LC‐MS/MS , 2009, Proteomics.

[16]  Lars Malmström,et al.  2DDB – a bioinformatics solution for analysis of quantitative proteomics data , 2006, BMC Bioinformatics.

[17]  Chad R. Weisbrod,et al.  Accurate peptide fragment mass analysis: multiplexed peptide identification and quantification. , 2012, Journal of proteome research.

[18]  S. Agarwal,et al.  The Streptococcus pyogenes orphan protein tyrosine phosphatase, SP‐PTP, possesses dual specificity and essential virulence regulatory functions , 2015, Molecular microbiology.

[19]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[20]  Tao Xu,et al.  Bioinformatics Applications Note Sequence Analysis Xdia: Improving on the Label-free Data-independent Analysis , 2022 .

[21]  David S. Wishart,et al.  Circular genome visualization and exploration using CGView , 2005, Bioinform..

[22]  Xiaojing Wang,et al.  customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search , 2013, Bioinform..

[23]  H. Wiker,et al.  Proteogenomics in microbiology: Taking the right turn at the junction of genomics and proteomics , 2014, Proteomics.

[24]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[25]  M. Mann,et al.  Proteomics on an Orbitrap Benchtop Mass Spectrometer Using All-ion Fragmentation , 2010, Molecular & Cellular Proteomics.

[26]  R. Olsen,et al.  Distinct Single Amino Acid Replacements in the Control of Virulence Regulator Protein Differentially Impact Streptococcal Pathogenesis , 2011, PLoS pathogens.

[27]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[28]  Ludovic C. Gillet,et al.  Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis* , 2012, Molecular & Cellular Proteomics.

[29]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[30]  Chad W. Euler,et al.  M.SpyI, a DNA Methyltransferase Encoded on a mefA Chimeric Element, Modifies the Genome of Streptococcus pyogenes , 2006, Journal of bacteriology.

[31]  Lars Malmström,et al.  Business intelligence strategies enables rapid analysis of quantitative proteomics data , 2012 .

[32]  P. Weigel,et al.  Molecular cloning, identification, and sequence of the hyaluronan synthase gene from group A Streptococcus pyogenes. , 1993, The Journal of biological chemistry.

[33]  Peter Z. Kunszt,et al.  Using synthetic peptides to benchmark peptide identification software and search parameters for MS/MS data analysis , 2014 .

[34]  M. Goshe,et al.  Improving protein and proteome coverage through data-independent multiplexed peptide fragmentation. , 2010, Journal of proteome research.

[35]  A. Bisno,et al.  Molecular basis of group A streptococcal virulence. , 2003, The Lancet. Infectious diseases.

[36]  M. Cunningham,et al.  Pathogenesis of group A streptococcal infections. , 2000, Clinical microbiology reviews.

[37]  M. Gorenstein,et al.  Quantitative proteomic analysis by accurate mass retention time pairs. , 2005, Analytical chemistry.

[38]  S. W. Long,et al.  Evolutionary pathway to increased virulence and epidemic group A Streptococcus disease derived from 3,615 genome sequences , 2014, Proceedings of the National Academy of Sciences.

[39]  M. Rasmussen,et al.  Axillary Abscess Complicated by Venous Thrombosis: Identification of Streptococcus pyogenes by 16S PCR , 2010, Journal of Clinical Microbiology.

[40]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[41]  Bruce A. Roe,et al.  Complete genome sequence of an M1 strain of Streptococcus pyogenes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[43]  Xudong Yao,et al.  Tandem parallel fragmentation of peptides for mass spectrometry. , 2006, Analytical chemistry.

[44]  John D. Venable,et al.  Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra , 2004, Nature Methods.

[45]  D. Goodlett,et al.  Shotgun collision‐induced dissociation of peptides using a time of flight mass analyzer , 2003, Proteomics.

[46]  Lennart Martens,et al.  TraML—A Standard Format for Exchange of Selected Reaction Monitoring Transition Lists* , 2011, Molecular & Cellular Proteomics.

[47]  Lars Malmström,et al.  Proteomic 2DE database for spot selection, automated annotation, and data analysis. , 2002, Journal of proteome research.

[48]  K. O'Brien,et al.  Epidemiology of invasive group a streptococcus disease in the United States, 1995-1999. , 2002, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[49]  Lars Malmström,et al.  DIANA - algorithmic improvements for analysis of data-independent acquisition MS data , 2015, Bioinform..