Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis

Recent advances in the scale and diversity of population genomic datasets for bacteria now provide the potential for genome-wide patterns of co-evolution to be studied at the resolution of individual bases. Here we describe a new statistical method, genomeDCA, which uses recent advances in computational structural biology to identify the polymorphic loci under the strongest co-evolutionary pressures. We apply genomeDCA to two large population data sets representing the major human pathogens Streptococcus pneumoniae (pneumococcus) and Streptococcus pyogenes (group A Streptococcus). For pneumococcus we identified 5,199 putative epistatic interactions between 1,936 sites. Over three-quarters of the links were between sites within the pbp2x, pbp1a and pbp2b genes, the sequences of which are critical in determining non-susceptibility to beta-lactam antibiotics. A network-based analysis found these genes were also coupled to that encoding dihydrofolate reductase, changes to which underlie trimethoprim resistance. Distinct from these antibiotic resistance genes, a large network component of 384 protein coding sequences encompassed many genes critical in basic cellular functions, while another distinct component included genes associated with virulence. The group A Streptococcus (GAS) data set population represents a clonal population with relatively little genetic variation and a high level of linkage disequilibrium across the genome. Despite this, we were able to pinpoint two RNA pseudouridine synthases, which were each strongly linked to a separate set of loci across the chromosome, representing biologically plausible targets of co-selection. The population genomic analysis method applied here identifies statistically significantly co-evolving locus pairs, potentially arising from fitness selection interdependence reflecting underlying protein-protein interactions, or genes whose product activities contribute to the same phenotype. This discovery approach greatly enhances the future potential of epistasis analysis for systems biology, and can complement genome-wide association studies as a means of formulating hypotheses for targeted experimental work.

[1]  T. Hirano SMC proteins and chromosome mechanics: from bacteria to humans , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[2]  Erwin Frey,et al.  Growth dynamics and the evolution of cooperation in microbial populations , 2012, Scientific Reports.

[3]  W. Hanage,et al.  Comprehensive Identification of Single Nucleotide Polymorphisms Associated with Beta-lactam Resistance within Pneumococcal Mosaic Genes , 2014, PLoS genetics.

[4]  C. Dahout-Gonzalez,et al.  PBP active site flexibility as the key mechanism for beta-lactam resistance in pneumococci. , 2009, Journal of molecular biology.

[5]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[6]  S. W. Long,et al.  Evolutionary pathway to increased virulence and epidemic group A Streptococcus disease derived from 3,615 genome sequences , 2014, Proceedings of the National Academy of Sciences.

[7]  J. Corander,et al.  Phylogeographic variation in recombination rates within a global clone of methicillin-resistant Staphylococcus aureus , 2012, Genome Biology.

[8]  J. Besag Nearest‐Neighbour Systems and the Auto‐Logistic Model for Binary Data , 1972 .

[9]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[10]  David R. Riley,et al.  Phenotypic, genomic, and transcriptional characterization of Streptococcus pneumoniae interacting with human pharyngeal cells , 2013, BMC Genomics.

[11]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[12]  S. Kirkpatrick,et al.  Solvable Model of a Spin-Glass , 1975 .

[13]  M. Cetron,et al.  Increasing prevalence of multidrug-resistant Streptococcus pneumoniae in the United States. , 2000, The New England journal of medicine.

[14]  J. Corander,et al.  Identification of enterotoxigenic Escherichia coli (ETEC) clades with long-term global distribution , 2014, Nature Genetics.

[15]  Jeffrey P. Maskell,et al.  Multiple Mutations Modulate the Function of Dihydrofolate Reductase in Trimethoprim-ResistantStreptococcus pneumoniae , 2001, Antimicrobial Agents and Chemotherapy.

[16]  W. Hanage,et al.  Evidence for Soft Selective Sweeps in the Evolution of Pneumococcal Multidrug Resistance and Vaccine Escape , 2014, Genome biology and evolution.

[17]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[18]  Jukka Corander,et al.  Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes , 2016, Nature Communications.

[19]  Stanislas Leibler,et al.  Simpson's Paradox in a Synthetic Microbial System , 2009, Science.

[20]  Narmada Thanki,et al.  CDD: NCBI's conserved domain database , 2014, Nucleic Acids Res..

[21]  R. Hakenbeck,et al.  Penicillin-binding proteins 2b and 2x of Streptococcus pneumoniae are primary resistance determinants for different classes of beta-lactam antibiotics , 1996, Antimicrobial agents and chemotherapy.

[22]  Huanming Yang,et al.  Epidemic Clones, Oceanic Gene Pools, and Eco-LD in the Free Living Marine Pathogen Vibrio parahaemolyticus. , 2014, Molecular biology and evolution.

[23]  A. Tomasz,et al.  Variable recombination dynamics during the emergence, transmission and ‘disarming’ of a multidrug-resistant pneumococcal clone , 2014, BMC Biology.

[24]  M. Quail,et al.  Role of Conjugative Elements in the Evolution of the Multidrug-Resistant Pandemic Clone Streptococcus pneumoniaeSpain23F ST81 , 2008, Journal of bacteriology.

[25]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[26]  Marcin J. Skwark,et al.  Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns , 2014, PLoS Comput. Biol..

[27]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[28]  K. Holt,et al.  Out-of-Africa migration and Neolithic co-expansion of Mycobacterium tuberculosis with modern humans , 2013, Nature Genetics.

[29]  B. Spratt,et al.  Genetics of resistance to third‐generation cephalosporins in clinical isolates of Streptococcus pneumoniae , 1992, Molecular microbiology.

[30]  Keith A. Jolley,et al.  Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter , 2013, Proceedings of the National Academy of Sciences.

[31]  M. Stephens,et al.  Bayesian statistical methods for genetic association studies , 2009, Nature Reviews Genetics.

[32]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[33]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[34]  R. May,et al.  The maintenance of strain structure in populations of recombining infectious agents , 1996, Nature Medicine.

[35]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[36]  Simona Cocco,et al.  Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models , 2015, bioRxiv.

[37]  T. Vernet,et al.  Common Alterations in PBP1a from Resistant Streptococcus pneumoniae Decrease Its Reactivity toward β-Lactams , 2008, Journal of Biological Chemistry.

[38]  Jukka Corander,et al.  Evolution and transmission of drug resistant tuberculosis in a Russian population , 2014, Nature Genetics.

[39]  J. Burton,et al.  Rapid Pneumococcal Evolution in Response to Clinical Interventions , 2011, Science.

[40]  Carlo Baldassi,et al.  Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners , 2014, PloS one.

[41]  J. Corander,et al.  Genomic signatures of human and animal disease in the zoonotic pathogen Streptococcus suis , 2015, Nature Communications.

[42]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[43]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[44]  M. Weigt,et al.  Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1 , 2015, bioRxiv.

[45]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Peter E. Chen,et al.  The advent of genome-wide association studies for bacteria. , 2015, Current opinion in microbiology.

[47]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[48]  Anthony M. Smith,et al.  Alterations in PBP 1A Essential for High-Level Penicillin Resistance in Streptococcus pneumoniae , 1998, Antimicrobial Agents and Chemotherapy.

[49]  J. Lara,et al.  Coordinated evolution of the hepatitis C virus , 2008, Proceedings of the National Academy of Sciences.

[50]  Magnus Ekeberg,et al.  Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences , 2014, J. Comput. Phys..

[51]  B. Maček,et al.  Interplay of the Serine/Threonine-Kinase StkP and the Paralogs DivIVA and GpsB in Pneumococcal Cell Elongation and Division , 2014, PLoS genetics.

[52]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[53]  Marcin J. Skwark,et al.  Improving Contact Prediction along Three Dimensions , 2014, PLoS Comput. Biol..

[54]  Mario Recker,et al.  Predicting the virulence of MRSA from its genome sequence , 2014, Genome research.

[55]  Terence Hwa,et al.  Direct coupling analysis for protein contact prediction. , 2014, Methods in molecular biology.

[56]  Simona Cocco,et al.  ACE: adaptive cluster expansion for maximum entropy graphical model inference , 2016, bioRxiv.

[57]  M. Lipsitch,et al.  Population genomics of post-vaccine changes in pneumococcal epidemiology , 2013, Nature Genetics.

[58]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[59]  M. Lipsitch,et al.  Identification of pneumococcal colonization determinants in the stringent response pathway facilitated by genomic diversity , 2015, BMC Genomics.

[60]  Erik van Nimwegen,et al.  Inferring Contacting Residues within and between Proteins: What Do the Probabilities Mean? , 2016, PLoS Comput. Biol..

[61]  J. Dushoff,et al.  Prevalence of Epistasis in the Evolution of Influenza A Surface Proteins , 2011, PLoS genetics.

[62]  A. Ogunniyi,et al.  The genes encoding virulence-associated proteins and the capsule of Streptococcus pneumoniae are upregulated and differentially expressed in vivo. , 2002, Microbiology.

[63]  A. Valencia,et al.  Emerging methods in protein co-evolution , 2013, Nature Reviews Genetics.

[64]  J. Gober,et al.  MreB, the cell shape‐determining bacterial actin homologue, co‐ordinates cell wall morphogenesis in Caulobacter crescentus , 2004, Molecular microbiology.

[65]  Jukka Corander,et al.  Dense genomic sampling identifies highways of pneumococcal recombination , 2014, Nature Genetics.

[66]  S. Nadarajah,et al.  Extreme Value Distributions: Theory and Applications , 2000 .

[67]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[68]  Erik Aurell,et al.  The Maximum Entropy Fallacy Redux? , 2016, PLoS Comput. Biol..

[69]  M. Pirmohamed,et al.  Emergence and global spread of epidemic healthcare-associated Clostridium difficile , 2012, Nature Genetics.

[70]  C. Dowson,et al.  Evolution of penicillin resistance in Streptococcus pneumoniae; the role of Streptococcus mitis in the formation of a low affinity PBP2B in S. pneumoniae , 1993, Molecular microbiology.

[71]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[72]  E. Neher How frequent are correlated changes in families of protein sequences? , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[73]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[74]  Julian Parkhill,et al.  Whole genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing , 2012, Nature Genetics.

[75]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[76]  B. Spratt Resistance to antibiotics mediated by target alterations. , 1994, Science.

[77]  M. G. Pinho,et al.  Bacterial Cell Wall Synthesis: New Insights from Localization Studies , 2005, Microbiology and Molecular Biology Reviews.

[78]  R. A. Day,et al.  Improved Resolution of Hydrophobic Penicillin-binding Proteins and Their Covalently Linked Complexes on a Modified C18 Reversed Phase Column , 2000 .