Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology

Genome-wide association studies (GWASs) have become an increasingly important approach for eukaryotic geneticists, facilitating the identification of hundreds of genetic polymorphisms that are responsible for inherited diseases. Despite the relative simplicity of bacterial genomes, the application of GWASs to identify polymorphisms responsible for important bacterial phenotypes has only recently been made possible through advances in genome sequencing technologies. Bacterial GWASs are now about to come of age thanks to the availability of massive datasets, and because of the potential to bridge genomics and traditional genetic approaches that is provided by improving validation strategies. A small number of pioneering GWASs in bacteria have been published in the past 2 years, examining from 75 to more than 3,000 strains. The experimental designs have been diverse, taking advantage of different processes in bacteria for generating variation. Analysis of data from bacterial GWASs can, to some extent, be performed using software developed for eukaryotic systems, but there are important differences in genome evolution that must be considered. The greatest experimental advantage of bacterial GWASs is the potential to perform downstream validation of causality and dissection of mechanism. We review the recent advances and remaining challenges in this field and propose strategies to improve the validation of bacterial GWASs.

[1]  Brian C. Thomas,et al.  Fermentation, Hydrogen, and Sulfur Metabolism in Multiple Uncultivated Bacterial Phyla , 2012, Science.

[2]  Christopher M Thomas,et al.  Mechanisms of, and Barriers to, Horizontal Gene Transfer between Bacteria , 2005, Nature Reviews Microbiology.

[3]  Yoshiaki Kawamura,et al.  Emergence and Spread of Neisseria gonorrhoeae Clinical Isolates Harboring Mosaic-Like Structure of Penicillin-Binding Protein 2 in Central Japan , 2005, Antimicrobial Agents and Chemotherapy.

[4]  J. Rolain,et al.  ARG-ANNOT, a New Bioinformatic Tool To Discover Antibiotic Resistance Genes in Bacterial Genomes , 2013, Antimicrobial Agents and Chemotherapy.

[5]  Jizhong Zhou Microarrays for bacterial detection and microbial community analysis. , 2003, Current opinion in microbiology.

[6]  Daniel J. Wilson,et al.  Prediction of Staphylococcus aureus Antimicrobial Resistance by Whole-Genome Sequencing , 2014, Journal of Clinical Microbiology.

[7]  Chris F. Taylor,et al.  The minimum information about a genome sequence (MIGS) specification , 2008, Nature Biotechnology.

[8]  M. Pallen,et al.  Laboratory strains of Escherichia coli: model citizens or deceitful delinquents growing old disgracefully? , 2007, Molecular microbiology.

[9]  Barry G. Hall,et al.  When Whole-Genome Alignments Just Won't Work: kSNP v2 Software for Alignment-Free SNP Discovery and Phylogenetics of Hundreds of Microbial Genomes , 2013, PloS one.

[10]  Mikhail Pachkov,et al.  Automated Reconstruction of Whole-Genome Phylogenies from Short-Sequence Reads , 2014, Molecular biology and evolution.

[11]  Matthew D. Dyer,et al.  The Landscape of Human Proteins Interacting with Viruses and Other Pathogens , 2008, PLoS pathogens.

[12]  Don H. Anderson,et al.  The pivotal role of the complement system in aging and age-related macular degeneration: Hypothesis re-visited , 2010, Progress in retinal and eye research.

[13]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[14]  Julian Parkhill,et al.  Rapid whole-genome sequencing for investigation of a neonatal MRSA outbreak. , 2012, The New England journal of medicine.

[15]  Mario Recker,et al.  Predicting the virulence of MRSA from its genome sequence , 2014, Genome research.

[16]  Pascal Lapierre,et al.  Estimating the size of the bacterial pan-genome. , 2009, Trends in genetics : TIG.

[17]  Peter E. Chen,et al.  Genetic variation and linkage disequilibrium in Bacillus anthracis , 2011, Scientific reports.

[18]  Kenneth W. Bayles,et al.  A Genetic Resource for Rapid and Comprehensive Phenotype Screening of Nonessential Staphylococcus aureus Genes , 2013, mBio.

[19]  M. Maiden,et al.  Multilocus sequence typing. , 2009, Methods in molecular biology.

[20]  J. Parkhill,et al.  The Impact of Recombination on dN/dS within Recently Emerged Bacterial Clones , 2011, PLoS pathogens.

[21]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[22]  Shaun M. Purcell,et al.  Statistical power and significance testing in large-scale genetic studies , 2014, Nature Reviews Genetics.

[23]  D. G. MacArthur,et al.  Guidelines for investigating causality of sequence variants in human disease , 2014, Nature.

[24]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[25]  N. Day,et al.  Virulent Combinations of Adhesin and Toxin Genes in Natural Populations of Staphylococcus aureus , 2002, Infection and Immunity.

[26]  Ole A. Andreassen,et al.  The Impact of Divergence Time on the Nature of Population Structure: An Example from Iceland , 2009, PLoS genetics.

[27]  Otto X. Cordero,et al.  Population Genomics of Early Events in the Ecological Differentiation of Bacteria , 2012, Science.

[28]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[29]  S Falkow,et al.  Molecular Koch's postulates applied to microbial pathogenicity. , 1988, Reviews of infectious diseases.

[30]  S. Méresse,et al.  Salmonella T3SSs: successful mission of the secret(ion) agents. , 2013, Current opinion in microbiology.

[31]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[32]  T. Read,et al.  Hypervirulent Chlamydia trachomatis Clinical Strain Is a Recombinant between Lymphogranuloma Venereum (L2) and D Lineages , 2011, mBio.

[33]  Itai Sharon,et al.  Genomes from Metagenomics , 2013, Science.

[34]  A. Witney,et al.  Application of Comparative Phylogenomics To Study the Evolution of Yersinia enterocolitica and To Identify Genetic Differences Relating to Pathogenicity , 2006, Journal of bacteriology.

[35]  Keith A. Jolley,et al.  Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter , 2013, Proceedings of the National Academy of Sciences.

[36]  Pak Chung Sham,et al.  Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits , 2003, Bioinform..

[37]  Daniel J. Wilson,et al.  Transforming clinical microbiology with bacterial genome sequencing , 2012, Nature Reviews Genetics.

[38]  David R. Riley,et al.  Whole-Genome Association Study on Tissue Tropism Phenotypes in Group A Streptococcus , 2011, Journal of bacteriology.

[39]  K. Lange,et al.  Prioritizing GWAS results: A review of statistical methods and recommendations for their application. , 2010, American journal of human genetics.

[40]  Mary Sara McPeek,et al.  ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure. , 2010, American journal of human genetics.

[41]  Barry G. Hall,et al.  SNP-Associations and Phenotype Predictions from Hundreds of Microbial Genomes without Genome Alignments , 2014, PloS one.

[42]  G. Abecasis,et al.  Low-coverage sequencing: implications for design of complex trait association studies. , 2011, Genome research.

[43]  Eric J Alm,et al.  Looking for Darwin's footprints in the microbial world. , 2009, Trends in microbiology.

[44]  X. Didelot,et al.  Impact of recombination on bacterial evolution. , 2010, Trends in microbiology.

[45]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[46]  D. Falush,et al.  Inference of Homologous Recombination in Bacteria Using Whole-Genome Sequences , 2010, Genetics.

[47]  Sheng Feng,et al.  GWAPower: a statistical power calculation software for genome-wide association studies with quantitative traits , 2011, BMC Genetics.

[48]  R. Sebro,et al.  Testing for non‐random mating: evidence for ancestry‐related assortative mating in the Framingham heart study , 2010, Genetic epidemiology.

[49]  K. Holt,et al.  Out-of-Africa migration and Neolithic co-expansion of Mycobacterium tuberculosis with modern humans , 2013, Nature Genetics.

[50]  Bastien Chevreux,et al.  The Origins of 168, W23, and Other Bacillus subtilis Legacy Strains , 2008, Journal of bacteriology.

[51]  James M. Musser,et al.  Molecular Correlates of Host Specialization in Staphylococcus aureus , 2007, PloS one.

[52]  N. Moran,et al.  Deletional bias and the evolution of bacterial genomes. , 2001, Trends in genetics : TIG.

[53]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[54]  J Hacker,et al.  Regulation of sigmaB-dependent transcription of sigB and asp23 in two different Staphylococcus aureus strains. , 1999, Molecular & general genetics : MGG.

[55]  Omar Salim,et al.  Co-evolution of genomes and plasmids within Chlamydia trachomatis and the emergence in Sweden of a new variant strain , 2009, BMC Genomics.

[56]  Daniel Falush,et al.  Efficient Inference of Recombination Hot Regions in Bacterial Genomes , 2014, Molecular biology and evolution.

[57]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[58]  Nicholas K. Priest,et al.  From genotype to phenotype: can systems biology be used to predict Staphylococcus aureus virulence? , 2012, Nature Reviews Microbiology.

[59]  T. Dallman,et al.  Performance comparison of benchtop high-throughput sequencing platforms , 2012, Nature Biotechnology.

[60]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[61]  M. Pallen,et al.  Whole-Genome Sequencing for Rapid Susceptibility Testing of M. tuberculosis , 2013 .

[62]  T. Read,et al.  Population genomics of Chlamydia trachomatis: insights on drift, selection, recombination, and population structure. , 2012, Molecular biology and evolution.

[63]  Daniel Falush,et al.  Genome-wide association mapping in bacteria? , 2006, Trends in microbiology.

[64]  J. McCormick,et al.  Staphylococcal superantigens in colonization and disease , 2012, Front. Cell. Inf. Microbio..

[65]  N. Craddock,et al.  Genome-wide association studies: a primer , 2009, Psychological Medicine.

[66]  W. Hanage,et al.  Comprehensive Identification of Single Nucleotide Polymorphisms Associated with Beta-lactam Resistance within Pneumococcal Mosaic Genes , 2014, PLoS genetics.

[67]  Nicholas J Loman,et al.  High-throughput sequencing and clinical microbiology: progress, opportunities and challenges. , 2010, Current opinion in microbiology.

[68]  N. Thomson,et al.  Studying bacterial transcriptomes using RNA-seq , 2010, Current opinion in microbiology.

[69]  R. Valdivia,et al.  Forward genetic approaches in Chlamydia trachomatis. , 2013, Journal of visualized experiments : JoVE.

[70]  J. Derrick,et al.  Epidemiological evidence for the role of the hemoglobin receptor, hmbR, in meningococcal virulence. , 2009, The Journal of infectious diseases.

[71]  Brett E. Pickett,et al.  Standardized Metadata for Human Pathogen/Vector Genomic Sequences , 2014, PloS one.

[72]  Timothy P. L. Smith,et al.  Reducing assembly complexity of microbial genomes with single-molecule sequencing , 2013, Genome Biology.

[73]  Daniel J. Wilson,et al.  Mobile elements drive recombination hotspots in the core genome of Staphylococcus aureus , 2014, Nature Communications.

[74]  Karen N. Conneely,et al.  Dissecting Vancomycin-Intermediate Resistance in Staphylococcus aureus Using Genome-Wide Association , 2014, Genome biology and evolution.

[75]  J. Ragoussis Genotyping technologies for genetic research. , 2009, Annual review of genomics and human genetics.

[76]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Razvan Sultana,et al.  Genomic Analysis Identifies Targets of Convergent Positive Selection in Drug Resistant Mycobacterium tuberculosis , 2013, Nature Genetics.

[78]  K. Jolley,et al.  A chromosomally integrated bacteriophage in invasive meningococci , 2005, The Journal of experimental medicine.

[79]  Jesse Shapiro,et al.  A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens , 2014, Genome Medicine.

[80]  Pavel A Pevzner,et al.  Genome of the pathogen Porphyromonas gingivalis recovered from a biofilm in a hospital sink using a high-throughput single-cell genomics platform , 2013, Genome research.

[81]  A. Lusis,et al.  Systems genetics approaches to understand complex traits , 2013, Nature Reviews Genetics.

[82]  A. Camilli,et al.  Tn-seq; high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms , 2009, Nature Methods.

[83]  Frederick M Ausubel,et al.  Correction for Liberati et al., An ordered, nonredundant library of Pseudomonas aeruginosa strain PA14 transposon insertion mutants , 2006, Proceedings of the National Academy of Sciences.

[84]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[85]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[86]  Julian Parkhill,et al.  Genomic epidemiology of Neisseria gonorrhoeae with reduced susceptibility to cefixime in the USA: a retrospective observational study , 2014, The Lancet. Infectious diseases.