Practical Approaches for Detecting Selection in Microbial Genomes

Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

[1]  Ruifu Yang,et al.  Historical variations in mutation rate in an epidemic pathogen, Yersinia pestis , 2012, Proceedings of the National Academy of Sciences.

[2]  A. Wilson,et al.  Generation time and genomic evolution in primates. , 1973, Science.

[3]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[4]  Adi Doron-Faigenboim,et al.  FastML: a web server for probabilistic reconstruction of ancestral sequences , 2012, Nucleic Acids Res..

[5]  David J. Edwards,et al.  Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data , 2013, Microbial Informatics and Experimentation.

[6]  Richard A. Goldstein,et al.  Probabilistic reconstruction of ancestral protein sequences , 1996, Journal of Molecular Evolution.

[7]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[8]  Ryan D. Hernandez,et al.  Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome , 2008, PLoS genetics.

[9]  M. Stratton Exploring the Genomes of Cancer Cells: Progress and Promise , 2011, Science.

[10]  Julian Parkhill,et al.  Genomic epidemiology of Neisseria gonorrhoeae with reduced susceptibility to cefixime in the USA: a retrospective observational study , 2014, The Lancet. Infectious diseases.

[11]  Julian Parkhill,et al.  Capturing the cloud of diversity reveals complexity and heterogeneity of MRSA carriage, infection and transmission , 2015, Nature Communications.

[12]  C. D. Long,et al.  The Competitive Cost of Antibiotic Resistance in Mycobacterium tuberculosis , 2006, Science.

[13]  Z. Yang,et al.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. , 2000, Molecular biology and evolution.

[14]  Mark Borodovsky,et al.  GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses , 2005, Nucleic Acids Res..

[15]  D. Dykhuizen,et al.  High frequency of hotspot mutations in core genes of Escherichia coli due to short-term positive selection , 2009, Proceedings of the National Academy of Sciences.

[16]  D. P. Speert,et al.  Genetic Adaptation of Pseudomonas aeruginosa to the Airways of Cystic Fibrosis Patients Is Catalyzed by Hypermutation , 2008, Journal of bacteriology.

[17]  Jacqueline A. Keane,et al.  Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins , 2014, Nucleic acids research.

[18]  N. Loman,et al.  High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity , 2012, Nature Reviews Microbiology.

[19]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[20]  P. Sniegowski,et al.  Beneficial mutations and the dynamics of adaptation in asexual populations , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[21]  Julian Parkhill,et al.  A genomic portrait of the emergence, evolution, and global spread of a methicillin-resistant Staphylococcus aureus pandemic , 2013, Genome research.

[22]  Richard G. Everitt,et al.  Within-Host Evolution of Staphylococcus aureus during Asymptomatic Carriage , 2013, PloS one.

[23]  Daniel J. Wilson,et al.  Estimating Diversifying Selection and Functional Constraint in the Presence of Recombination , 2006, Genetics.

[24]  Eduardo P C Rocha,et al.  Comparisons of dN/dS are time dependent for closely related bacterial genomes. , 2006, Journal of theoretical biology.

[25]  W. Li,et al.  Evidence for higher rates of nucleotide substitution in rodents than in man. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[27]  Andries J. van Tonder,et al.  Evolutionary dynamics of Clostridium difficile over short and long time scales , 2010, Proceedings of the National Academy of Sciences.

[28]  Mika Salminen,et al.  The Phylogenetic Handbook: Detecting and characterizing individual recombination events , 2009 .

[29]  Rita R. Colwell,et al.  Phylodynamic Analysis of Clinical and Environmental Vibrio cholerae Isolates from Haiti Reveals Diversification Driven by Positive Selection , 2014, mBio.

[30]  D. Posada,et al.  The Influence of Re combination on the Estimation of Selection from Coding Sequence Alignments , 2014 .

[31]  Kim Rutherford,et al.  Artemis: sequence visualization and annotation , 2000, Bioinform..

[32]  J. Burton,et al.  Rapid Pneumococcal Evolution in Response to Clinical Interventions , 2011, Science.

[33]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[34]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[35]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[36]  Adam Eyre-Walker,et al.  Adaptive protein evolution in Drosophila , 2002, Nature.

[37]  Daniel J. Wilson,et al.  Transforming clinical microbiology with bacterial genome sequencing , 2012, Nature Reviews Genetics.

[38]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[39]  Daniel J. Wilson,et al.  ClonalFrameML: Efficient Inference of Recombination in Whole Bacterial Genomes , 2015, PLoS Comput. Biol..

[40]  Julian Parkhill,et al.  Evidence for several waves of global transmission in the seventh cholera pandemic , 2011, Nature.

[41]  Mick Watson,et al.  The automatic annotation of bacterial genomes , 2012, Briefings Bioinform..

[42]  Giovanna Morelli,et al.  Microevolution of Helicobacter pylori during Prolonged Infection of Single Hosts and within Families , 2010, PLoS genetics.

[43]  Omar E. Cornejo,et al.  Evolutionary and population genomics of the cavity causing bacteria Streptococcus mutans. , 2013, Molecular biology and evolution.

[44]  J. Wain,et al.  High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi , 2008, Nature Genetics.

[45]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[46]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[47]  S. Molin,et al.  Convergent evolution and adaptation of Pseudomonas aeruginosa within patients with cystic fibrosis , 2014, Nature Genetics.

[48]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[49]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[50]  R. Nielsen,et al.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. , 1998, Genetics.

[51]  A. Eyre-Walker,et al.  The genomic rate of adaptive amino acid substitution in Drosophila. , 2004, Molecular biology and evolution.

[52]  Justin C. Fay,et al.  Positive and negative selection on the human genome. , 2001, Genetics.

[53]  R. Shamir,et al.  A fast algorithm for joint reconstruction of ancestral amino acid sequences. , 2000, Molecular biology and evolution.

[54]  S. Sawyer Statistical tests for detecting gene conversion. , 1989, Molecular biology and evolution.

[55]  Dongfang Li,et al.  Genome sequencing of 161 Mycobacterium tuberculosis isolates from China identifies genes and intergenic regions associated with drug resistance , 2013, Nature Genetics.

[56]  Joanna B. Goldberg,et al.  Parallel bacterial evolution within multiple patients identifies candidate pathogenicity genes , 2011, Nature Genetics.

[57]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[58]  R. Nielsen,et al.  Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. , 2003, Genetics.

[59]  J. Corander,et al.  Detection of recombination events in bacterial genomes from large population samples , 2011, Nucleic acids research.

[60]  James H. Bullard,et al.  The origin of the Haitian cholera outbreak strain. , 2011, The New England journal of medicine.

[61]  K. Crandall,et al.  Recombination in evolutionary genomics. , 2002, Annual review of genetics.

[62]  B. Spratt,et al.  Sequence evolution of the porB gene of Neisseria gonorrhoeae and Neisseria meningitidis: evidence of positive Darwinian selection. , 1995, Molecular biology and evolution.

[63]  Jane Charlesworth,et al.  The McDonald-Kreitman test and slightly deleterious mutations. , 2008, Molecular biology and evolution.

[64]  D. Falush,et al.  Inference of Bacterial Microevolution Using Multilocus Sequence Data , 2007, Genetics.

[65]  James G. Booth,et al.  SnIPRE: Selection Inference Using a Poisson Random Effects Model , 2012, PLoS Comput. Biol..

[66]  James I Mullins,et al.  Potential impact of recombination on sitewise approaches for detecting positive natural selection. , 2003, Genetical research.

[67]  Todd M. Gierahn,et al.  Distinct Effects on Diversifying Selection by Two Mechanisms of Immunity against Streptococcus pneumoniae , 2012, PLoS pathogens.

[68]  P. Awadalla The evolutionary genomics of pathogen recombination , 2003, Nature Reviews Genetics.

[69]  Dominique Schneider,et al.  Tests of parallel molecular evolution in a long-term experiment with Escherichia coli. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[70]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[71]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[72]  Duane Szafron,et al.  BASys: a web server for automated bacterial genome annotation , 2005, Nucleic Acids Res..

[73]  K. Pepin Prokaryotic Genome Annotation Pipeline , 2009 .

[74]  S. Baldauf,et al.  Phylogeny for the faint of heart: a tutorial. , 2003, Trends in genetics : TIG.

[75]  Razvan Sultana,et al.  Genomic Analysis Identifies Targets of Convergent Positive Selection in Drug Resistant Mycobacterium tuberculosis , 2013, Nature Genetics.

[76]  Julian Parkhill,et al.  Molecular tracing of the emergence, adaptation, and transmission of hospital-associated methicillin-resistant Staphylococcus aureus , 2012, Proceedings of the National Academy of Sciences.

[77]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[78]  Steven J. M. Jones,et al.  Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. , 2011, The New England journal of medicine.

[79]  Mark J. Pallen,et al.  xBASE, a collection of online databases for bacterial comparative genomics , 2005, Nucleic Acids Res..

[80]  Peter Donnelly,et al.  Evolutionary dynamics of Staphylococcus aureus during progression from carriage to disease , 2012, Proceedings of the National Academy of Sciences.

[81]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[82]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[83]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[84]  J. Galagan,et al.  Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved , 2010, Nature Genetics.

[85]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[86]  Julian Parkhill,et al.  Evolution of MRSA During Hospital Transmission and Intercontinental Spread , 2010, Science.

[87]  Brian Charlesworth,et al.  The Effects of Deleterious Mutations on Evolution at Linked Sites , 2012, Genetics.

[88]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[89]  Omar E. Cornejo,et al.  The Role of Selection in Shaping Diversity of Natural M. tuberculosis Populations , 2013, PLoS pathogens.

[90]  M. Guyer,et al.  Charting a course for genomic medicine from base pairs to bedside , 2011, Nature.

[91]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[92]  Christina Backes,et al.  An integer linear programming approach for finding deregulated subgraphs in regulatory networks , 2011, Nucleic acids research.