Genome-wide epistasis and co-selection study using mutual information

Abstract Covariance-based discovery of polymorphisms under co-selective pressure or epistasis has received considerable recent attention in population genomics. Both statistical modeling of the population level covariation of alleles across the chromosome and model-free testing of dependencies between pairs of polymorphisms have been shown to successfully uncover patterns of selection in bacterial populations. Here we introduce a model-free method, SpydrPick, whose computational efficiency enables analysis at the scale of pan-genomes of many bacteria. SpydrPick incorporates an efficient correction for population structure, which adjusts for the phylogenetic signal in the data without requiring an explicit phylogenetic tree. We also introduce a new type of visualization of the results similar to the Manhattan plots used in genome-wide association studies, which enables rapid exploration of the identified signals of co-evolution. Simulations demonstrate the usefulness of our method and give some insight to when this type of analysis is most likely to be successful. Application of the method to large population genomic datasets of two major human pathogens, Streptococcus pneumoniae and Neisseria meningitidis, revealed both previously identified and novel putative targets of co-selection related to virulence and antibiotic resistance, highlighting the potential of this approach to drive molecular discoveries, even in the absence of phenotypic data.

[1]  M. Wilson,et al.  Isolation of Neisseria meningitidis from the Genito-Urinary Tract and Anal Canal , 1975, Journal of clinical microbiology.

[2]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[3]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[4]  B. Spratt Resistance to antibiotics mediated by target alterations. , 1994, Science.

[5]  Is the ability of urinary tract pathogens to accumulate glycine betaine a factor in the virulence of pathogenic strains? , 1996, The Journal of laboratory and clinical medicine.

[6]  R. Hakenbeck,et al.  Penicillin-binding proteins 2b and 2x of Streptococcus pneumoniae are primary resistance determinants for different classes of beta-lactam antibiotics , 1996, Antimicrobial agents and chemotherapy.

[7]  Anthony M. Smith,et al.  Alterations in PBP 1A Essential for High-Level Penicillin Resistance in Streptococcus pneumoniae , 1998, Antimicrobial Agents and Chemotherapy.

[8]  M. Lever,et al.  Inhibitors of bacterial growth in urine: what is the role of betaines? , 1999, International journal of antimicrobial agents.

[9]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[10]  S. Salzberg,et al.  Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. , 2000, Science.

[11]  K. Rohde,et al.  Mechanisms of iron acquisition by the human pathogens Neisseria meningitidis and Neisseria gonorrhoeae. , 2003, Frontiers in bioscience : a journal and virtual library.

[12]  A. Schuchat,et al.  An outbreak of conjunctivitis due to atypical Streptococcus pneumoniae. , 2003, The New England journal of medicine.

[13]  L. C. Martin,et al.  Using information theory to search for co-evolving residues in proteins , 2005, Bioinform..

[14]  M. Marahiel,et al.  Ferri‐bacillibactin uptake and hydrolysis in Bacillus subtilis , 2006, Molecular microbiology.

[15]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[16]  B. Barrell,et al.  Meningococcal Genetic Variation Mechanisms Viewed through Comparative Analysis of Serogroup C Strain FAM18 , 2006, PLoS genetics.

[17]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[18]  F. Pattus,et al.  The Metal Dependence of Pyoverdine Interactions with Its Outer Membrane Receptor FpvA , 2008, Journal of bacteriology.

[19]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[20]  M. Quail,et al.  Role of Conjugative Elements in the Evolution of the Multidrug-Resistant Pandemic Clone Streptococcus pneumoniaeSpain23F ST81 , 2008, Journal of bacteriology.

[21]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[22]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[23]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[24]  Julien Dutheil,et al.  Detecting coevolving positions in a molecule: why and how to account for phylogeny , 2012, Briefings Bioinform..

[25]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[27]  Thomas R. Ioerger,et al.  Tryptophan Biosynthesis Protects Mycobacteria from CD4 T-Cell-Mediated Killing , 2013, Cell.

[28]  Marcin J. Skwark,et al.  Improving Contact Prediction along Three Dimensions , 2014, PLoS Comput. Biol..

[29]  Kevin R. Thornton,et al.  A C++ Template Library for Efficient Forward-Time Population Genetic Simulation of Large Populations , 2014, Genetics.

[30]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[31]  W. Hanage,et al.  Comprehensive Identification of Single Nucleotide Polymorphisms Associated with Beta-lactam Resistance within Pneumococcal Mosaic Genes , 2014, PLoS genetics.

[32]  Jukka Corander,et al.  Dense genomic sampling identifies highways of pneumococcal recombination , 2014, Nature Genetics.

[33]  M. Gorla,et al.  Genomic resolution of an aggressive, widespread, diverse and expanding meningococcal serogroup B, C and W lineage , 2015, The Journal of infection.

[34]  B. Rost,et al.  Better prediction of functional effects for sequence variants , 2015, BMC Genomics.

[35]  Huanming Yang,et al.  Epidemic Clones, Oceanic Gene Pools, and Eco-LD in the Free Living Marine Pathogen Vibrio parahaemolyticus. , 2014, Molecular biology and evolution.

[36]  Andrew J. Page,et al.  Roary: rapid large-scale prokaryote pan genome analysis , 2015, bioRxiv.

[37]  Peter E. Chen,et al.  The advent of genome-wide association studies for bacteria. , 2015, Current opinion in microbiology.

[38]  Mushal Allam,et al.  Genomic analysis of nontypeable pneumococci causing invasive pneumococcal disease in South Africa, 2003–2013 , 2016, BMC Genomics.

[39]  Hélène Omer,et al.  Characterization of MDAΦ, a temperate filamentous bacteriophage of Neisseria meningitidis. , 2016, Microbiology.

[40]  Jukka Corander,et al.  Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes , 2016, Nature Communications.

[41]  Simon R. Harris,et al.  SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments , 2016, bioRxiv.

[42]  Marek L Borowiec,et al.  AMAS: a fast tool for alignment manipulation and computing of summary statistics , 2016, PeerJ.

[43]  A. Larsen,et al.  A virulence-associated filamentous bacteriophage of Neisseria meningitidis increases host-cell colonisation , 2017, PLoS pathogens.

[44]  Marcin J. Skwark,et al.  Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis , 2016, bioRxiv.

[45]  Johannes Söding,et al.  Big-data approaches to protein structure prediction , 2017, Science.

[46]  P. François,et al.  Differential expression of hemoglobin receptor, HmbR, between carriage and invasive isolates of Neisseria meningitidis contributes to virulence: lessons from a clonal outbreak , 2018, Virulence.

[47]  Debora S. Marks,et al.  Genome-wide discovery of epistatic loci affecting antibiotic resistance in Neisseria gonorrhoeae using evolutionary couplings , 2018, Nature Microbiology.

[48]  Adam C. Retchless,et al.  Expansion of a urethritis-associated Neisseria meningitidis clade in the United States with concurrent acquisition of N. gonorrhoeae alleles , 2018, BMC Genomics.

[49]  Jukka Corander,et al.  SuperDCA for genome-wide epistasis analysis , 2017, bioRxiv.

[50]  James Hadfield,et al.  Phandango: an interactive viewer for bacterial population genomics , 2017, bioRxiv.

[51]  Erik Aurell,et al.  Correlation-compressed direct-coupling analysis , 2017, Physical Review E.

[52]  Anne-Florence Bitbol,et al.  Inferring interaction partners from protein sequences using mutual information , 2018, bioRxiv.

[53]  On the evolutionary ecology of multidrug resistance in bacteria , 2019 .

[54]  E. Aurell,et al.  DCA for genome-wide epistasis analysis: the statistical genetics perspective , 2018, Physical biology.

[55]  Jukka Corander,et al.  Fast and flexible bacterial genomic epidemiology with PopPUNK , 2018, bioRxiv.

[56]  Jukka Corander,et al.  High-dimensional structure learning of binary pairwise Markov networks: A comparative numerical study , 2019, Comput. Stat. Data Anal..