Everything at once: comparative analysis of the genomes of bacterial pathogens.

The sum of unique genes in all genomes of a bacterial species is referred to as the pan-genome and is comprised of variably absent or present accessory genes and universally present core genes. The accessory genome is an important source of genetic variability in bacterial populations, allowing sub-populations of bacteria to better adapt to specific niches. Such subgroups may themselves have a relatively stable core genome that may influence host preference, virulence, or an association with specific disease syndromes. The core genome provides a useful means of phylogenetic reconstruction as well as contributing to phenotypic heterogeneity. Variation within the pan-genome forms the basis of comparative genotyping techniques, which have evolved alongside technology. Current high-throughput sequencing platforms have created an unprecedented opportunity for comparisons among multiple, closely related genomes. The computer algorithms and software for such comparisons continue to evolve and promise exciting advances in the world of bacterial comparative genomics. We review genotyping techniques based upon phenotypic traits, both core and accessory genomes, and look at some of the software programs currently available to perform whole-genome comparative analyses.

[1]  F. J. Bruijn Use of repetitive (repetitive extragenic palindromic and enterobacterial repetitive intergeneric consensus) sequences and the polymerase chain reaction to fingerprint the genomes of Rhizobium meliloti isolates and other soil bacteria. , 1992 .

[2]  S. Salzberg,et al.  The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria , 2003, Nature.

[3]  James H. Bullard,et al.  The origin of the Haitian cholera outbreak strain. , 2011, The New England journal of medicine.

[4]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[5]  Edward A Graviss,et al.  Genome-wide analysis of synonymous single nucleotide polymorphisms in Mycobacterium tuberculosis complex organisms: resolution of genetic relationships among closely related microbial strains. , 2002, Genetics.

[6]  E. Price,et al.  Fingerprinting of Campylobacter jejuni by Using Resolution-Optimized Binary Gene Targets Derived from Comparative Genome Hybridization Studies , 2006, Applied and Environmental Microbiology.

[7]  M. Hattori,et al.  Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. , 2001, DNA research : an international journal for rapid publication of reports on genes and genomes.

[8]  G. Tannock,et al.  Plasmid profiling of members of the family Enterobacteriaceae, lactobacilli, and bifidobacteria to study the transmission of bacteria from mother to infant , 1990, Journal of clinical microbiology.

[9]  Alyssa C. Bumbaugh,et al.  Inferences from whole-genome sequences of bacterial pathogens. , 2002, Current opinion in genetics & development.

[10]  Knut Reinert,et al.  Segment-based multiple sequence alignment , 2008, ECCB.

[11]  L. Beutin,et al.  Comparative study of five different techniques for epidemiological typing of Escherichia coli O157. , 1998, Diagnostic microbiology and infectious disease.

[12]  U. Certa,et al.  Ongoing Genome Reduction in Mycobacterium ulcerans , 2007, Emerging infectious diseases.

[13]  Richard Durbin,et al.  A large genome center's improvements to the Illumina sequencing system , 2008, Nature Methods.

[14]  J. Wain,et al.  A multiplex single nucleotide polymorphism typing assay for detecting mutations that result in decreased fluoroquinolone susceptibility in Salmonella enterica serovars Typhi and Paratyphi A , 2010, The Journal of antimicrobial chemotherapy.

[15]  F. Rodríguez-Valera,et al.  The bacterial pan-genome:a new paradigm in microbiology. , 2010, International microbiology : the official journal of the Spanish Society for Microbiology.

[16]  C. Keys,et al.  Incidence and Tracking of Escherichia coli O157:H7 in a Major Produce Production Region in California , 2007, PloS one.

[17]  N. Groman Conversion by corynephages and its role in the natural history of diphtheria , 1984, Journal of Hygiene.

[18]  J. Prescott,et al.  Multilocus Sequence Typing Analysis of Clostridium perfringens Isolates from Necrotic Enteritis Outbreaks in Broiler Chicken Populations , 2008, Journal of Clinical Microbiology.

[19]  Egon A. Ozer,et al.  The Accessory Genome of Pseudomonas aeruginosa , 2010, Microbiology and Molecular Biology Reviews.

[20]  P. Vos,et al.  AFLP: a new technique for DNA fingerprinting. , 1995, Nucleic acids research.

[21]  B. Swaminathan,et al.  PulseNet: the molecular subtyping network for foodborne bacterial disease surveillance, United States. , 2001, Emerging infectious diseases.

[22]  Alyssa C. Bumbaugh,et al.  Parallel evolution of virulence in pathogenic Escherichia coli , 2000, Nature.

[23]  Jacques Ravel,et al.  Genome Sequence of the Deep-Rooted Yersinia pestis Strain Angola Reveals New Insights into the Evolution and Pangenome of the Plague Bacterium , 2010, Journal of bacteriology.

[24]  G. Bezanson,et al.  Involvement of plasmids in determining bacteriophage sensitivity in Salmonella typhimurium: genetic and physical analysis of phagovar 204. , 1982, Canadian journal of microbiology.

[25]  H. Tettelin,et al.  The microbial pan-genome. , 2005, Current opinion in genetics & development.

[26]  Steven J. M. Jones,et al.  Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. , 2011, The New England journal of medicine.

[27]  R. Chiodini,et al.  The impact of next-generation sequencing on genomics. , 2011, Journal of genetics and genomics = Yi chuan xue bao.

[28]  P. Fratamico,et al.  Foodborne pathogens : microbiology and molecular biology , 2005 .

[29]  Yuriy Fofanov,et al.  Molecular complexity of successive bacterial epidemics deconvoluted by comparative pathogenomics , 2010, Proceedings of the National Academy of Sciences.

[30]  F. D. de Bruijn,et al.  Specific genomic fingerprints of phytopathogenic Xanthomonas and Pseudomonas pathovars and strains generated with repetitive sequences and PCR , 1994, Applied and environmental microbiology.

[31]  J. Burton,et al.  Rapid Pneumococcal Evolution in Response to Clinical Interventions , 2011, Science.

[32]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[33]  Further evidence of constrained radiation in the evolution of pathogenic Escherichia coli O157:H7. , 2010, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[34]  R. Gautom Rapid pulsed-field gel electrophoresis protocol for typing of Escherichia coli O157:H7 and other gram-negative organisms in 1 day , 1997, Journal of clinical microbiology.

[35]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[37]  F. Dziva,et al.  Diagnostic and typing options for investigating diseases associated with Pasteurella multocida. , 2008, Veterinary microbiology.

[38]  Eduardo N. Taboada,et al.  In silico genomic analyses reveal three distinct lineages of Escherichia coli O157:H7, one of which is associated with hyper-virulence , 2009, BMC Genomics.

[39]  S. Salzberg,et al.  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39. , 2000, Nucleic acids research.

[40]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[41]  Rodrigo Lopez,et al.  WU-Blast2 server at the European Bioinformatics Institute , 2003, Nucleic Acids Res..

[42]  Thomas Lübberstedt,et al.  "PolyMin": software for identification of the minimum number of polymorphisms required for haplotype and genotype differentiation , 2009, BMC Bioinformatics.

[43]  Naruya Saitou,et al.  MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data , 2010, BMC Bioinformatics.

[44]  M. Kaufmann,et al.  Assessment of Resolution and Intercenter Reproducibility of Results of Genotyping Staphylococcus aureus by Pulsed-Field Gel Electrophoresis of SmaI Macrorestriction Fragments: a Multicenter Study , 1998, Journal of Clinical Microbiology.

[45]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[46]  J. Morris,et al.  Multilocus Sequence Typing Reveals a Lack of Diversity among Escherichia coli O157:H7 Isolates That Are Distinct by Pulsed-Field Gel Electrophoresis , 2003, Journal of Clinical Microbiology.

[47]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[48]  L. Price,et al.  Multiple-Locus Variable-Number Tandem Repeat Analysis Reveals Genetic Relationships within Bacillus anthracis , 2000, Journal of bacteriology.

[49]  Konstantinos Mavromatis,et al.  Microbial co-habitation and lateral gene transfer: what transposases can tell us , 2009, Genome Biology.

[50]  Meng-Yao Liu,et al.  Genome sequence of a serotype M3 strain of group A Streptococcus: Phage-encoded toxins, the high-virulence phenotype, and clone emergence , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[51]  N. W. Davis,et al.  Genome sequence of enterohaemorrhagic Escherichia coli O157:H7 , 2001, Nature.

[52]  Tetsuya Hayashi,et al.  Extensive genomic diversity and selective conservation of virulence-determinants in enterohemorrhagic Escherichia coli strains of O157 and non-O157 serotypes , 2007, Genome Biology.

[53]  S. Greene,et al.  Comparison of the Biolog OmniLog Identification System and 16S ribosomal RNA gene sequencing for accuracy in identification of atypical bacteria of clinical origin. , 2009, Journal of microbiological methods.

[54]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[55]  M. Kaufmann,et al.  Harmonization of Pulsed-Field Gel Electrophoresis Protocols for Epidemiological Typing of Strains of Methicillin-Resistant Staphylococcus aureus: a Single Approach Developed by Consensus in 10 European Laboratories and Its Application for Tracing the Spread of Related Strains , 2003, Journal of Clinical Microbiology.

[56]  A. Rambaut,et al.  Recent human-to-poultry host jump, adaptation, and pandemic spread of Staphylococcus aureus , 2009, Proceedings of the National Academy of Sciences.

[57]  William C. Nierman,et al.  Genome Sequencing and Analysis of Yersina pestis KIM D27, an Avirulent Strain Exempt from Select Agent Regulation , 2011, PloS one.

[58]  Ruifu Yang,et al.  Extended MLST-based population genetics and phylogeny of Vibrio parahaemolyticus with high levels of recombination. , 2011, International journal of food microbiology.

[59]  G. Kapperud,et al.  Multiple-locus variable-number tandem-repeats analysis of Listeria monocytogenes using multicolour capillary electrophoresis and comparison with pulsed-field gel electrophoresis typing. , 2008, Journal of microbiological methods.

[60]  G. Weinstock,et al.  High-throughput whole-genome sequencing to dissect the epidemiology of Acinetobacter baumannii isolates from a hospital outbreak. , 2010, The Journal of hospital infection.

[61]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[62]  V. Stanisich,et al.  New approaches to typing and identification of bacteria using the 16S-23S rDNA spacer region. , 1996, Microbiology.

[63]  Leeds Ca Pathogenicity islands and the evolution of bacterial pathogens. , 1996 .

[64]  Re-evaluation, optimization, and multilaboratory validation of the PulseNet-standardized pulsed-field gel electrophoresis protocol for Listeria monocytogenes. , 2010, Foodborne pathogens and disease.

[65]  Julian Parkhill,et al.  Evolution of MRSA During Hospital Transmission and Intercontinental Spread , 2010, Science.

[66]  F. Chetouani,et al.  FindTarget: software for subtractive genome analysis. , 2001, Microbiology.

[67]  T. Glenn Field guide to next‐generation DNA sequencers , 2011, Molecular ecology resources.

[68]  G. Edwards,et al.  Comparison of Two Multilocus Variable-Number Tandem-Repeat Methods and Pulsed-Field Gel Electrophoresis for Differentiating Highly Clonal Methicillin-Resistant Staphylococcus aureus Isolates , 2010, Journal of Clinical Microbiology.

[69]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[70]  N. Pace Mapping the Tree of Life: Progress and Prospects , 2009, Microbiology and Molecular Biology Reviews.

[71]  Hong Lin,et al.  nWayComp: A Genome-Wide Sequence Comparison Tool for Multiple Strains/Species of Phylogenetically Related Microorganisms , 2007, Silico Biol..

[72]  A. T. Vasconcelos,et al.  Swine and Poultry Pathogens: the Complete Genome Sequences of Two Strains of Mycoplasma hyopneumoniae and a Strain of Mycoplasma synoviae , 2005, Journal of bacteriology.

[73]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[74]  J. Terajima,et al.  Molecular characterization of enterohemorrhagic Escherichia coli O157:H7 isolates dispersed across Japan by pulsed-field gel electrophoresis and multiple-locus variable-number tandem repeat analysis. , 2008, Japanese journal of infectious diseases.

[75]  B. Swaminathan,et al.  Second generation subtyping: a proposed PulseNet protocol for multiple-locus variable-number tandem repeat analysis of Shiga toxin-producing Escherichia coli O157 (STEC O157). , 2006, Foodborne pathogens and disease.

[76]  Eduardo N. Taboada,et al.  Rapid Determination of Escherichia coli O157:H7 Lineage Types and Molecular Subtypes by Using Comparative Genomic Fingerprinting , 2008, Applied and Environmental Microbiology.

[77]  R. Fani,et al.  Use of random amplified polymorphic DNA (RAPD) for generating specific DNA probes for microorganisms , 1993, Molecular ecology.

[78]  N. Perna,et al.  progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement , 2010, PloS one.

[79]  Yongxiang Zhang,et al.  Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions , 2010, BMC Bioinformatics.

[80]  W. Gilbert,et al.  A new method for sequencing DNA. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[81]  Erin P. Price,et al.  Computer-aided identification of polymorphism sets diagnostic for groups of bacterial and viral genetic variants , 2007, BMC Bioinformatics.

[82]  Andrew M Kropinski,et al.  Genome sequence of adherent-invasive Escherichia coli and comparative genomic analysis with other E. coli pathotypes , 2010, BMC Genomics.

[83]  M. Russell,et al.  Genomics-Based Molecular Epidemiology of Campylobacter jejuni Isolates from Feedlot Cattle and from People in Alberta, Canada , 2008, Journal of Clinical Microbiology.

[84]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[85]  A. Bauer,et al.  Antibiotic susceptibility testing by a standardized single disk method. , 1966, American journal of clinical pathology.

[86]  S. Salzberg,et al.  Probing the pan-genome of Listeria monocytogenes: new insights into intraspecific niche expansion and genomic diversification , 2010, BMC Genomics.

[87]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[88]  T. Whittam,et al.  Variation in virulence among clades of Escherichia coli O157:H7 associated with disease outbreaks , 2008, Proceedings of the National Academy of Sciences.

[89]  Elaine R. Mardis,et al.  A precise reconstruction of the emergence and constrained radiations of Escherichia coli O157 portrayed by backbone concatenomic analysis , 2009, Proceedings of the National Academy of Sciences.

[90]  Y. Wasteson,et al.  Genomic fingerprinting of shigatoxin-producing Escherichia coli (STEC) strains: comparison of pulsed-field gel electrophoresis (PFGE) and fluorescent amplified-fragment-length polymorphism (FAFLP) , 2000, Epidemiology and Infection.

[91]  J. Wain,et al.  High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi , 2008, Nature Genetics.

[92]  Eduardo N. Taboada,et al.  Comparative genomic assessment of Multi-Locus Sequence Typing: rapid accumulation of genomic heterogeneity among clonal isolates of Campylobacter jejuni , 2008, BMC Evolutionary Biology.

[93]  S. Banno,et al.  Genome-Wide Single Nucleotide Polymorphism Typing Method for Identification of Bacillus anthracis Species and Strains among B. cereus Group Species , 2010, Journal of Clinical Microbiology.

[94]  E. Heir,et al.  Fluorescent Amplified-Fragment Length Polymorphism Genotyping of Salmonella enterica subsp. enterica Serovars and Comparison with Pulsed-Field Gel Electrophoresis Typing , 2000, Journal of Clinical Microbiology.

[95]  A. Kasarskis,et al.  A window into third-generation sequencing. , 2010, Human molecular genetics.

[96]  M. Wiedmann,et al.  Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss , 2010, BMC Genomics.

[97]  J. Ravel,et al.  Insights into the Environmental Resistance Gene Pool from the Genome Sequence of the Multidrug-Resistant Environmental Isolate Escherichia coli SMS-3-5 , 2008, Journal of bacteriology.

[98]  B. Gilpin,et al.  Comparison of PCR Binary Typing (P-BIT), a New Approach to Epidemiological Subtyping of Campylobacter jejuni, with Serotyping, Pulsed-Field Gel Electrophoresis, and Multilocus Sequence Typing Methods , 2009, Applied and Environmental Microbiology.

[99]  G. Caetano-Anollés,et al.  DNA markers : protocols, applications, and overviews , 1997 .

[100]  D. Ghosh,et al.  Library on a slide for bacterial comparative genomics , 2004, BMC Microbiology.

[101]  Jan LW Rademaker,et al.  Comparison of AFLP and rep-PCR genomic fingerprinting with DNA-DNA homology studies: Xanthomonas as a model system. , 2000, International journal of systematic and evolutionary microbiology.

[102]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[103]  M. Forsman,et al.  Whole-Genome Sequencing Reveals Distinct Mutational Patterns in Closely Related Laboratory and Naturally Propagated Francisella tularensis Strains , 2010, PloS one.

[104]  Steven Salzberg,et al.  Mugsy: fast multiple alignment of closely related whole genomes , 2010, Bioinform..

[105]  Huanchun Chen,et al.  Comparative Genomic Characterization of Actinobacillus pleuropneumoniae , 2010, Journal of bacteriology.

[106]  Matthew K. Waldor,et al.  Lysogenic Conversion by a Filamentous Phage Encoding Cholera Toxin , 1996, Science.

[107]  Y. Nakamura,et al.  Variable number of tandem repeat (VNTR) markers for human gene mapping. , 1987, Science.

[108]  J. M. Smith,et al.  Estimating recombinational parameters in Streptococcus pneumoniae from multilocus sequence typing data. , 2000, Genetics.

[109]  C. Robert,et al.  Analysis of the Rickettsia africae genome reveals that virulence acquisition in Rickettsia species may be explained by genome reduction , 2009, BMC Genomics.

[110]  Q. Zeng,et al.  Comparative Genomic Characterization of Francisella tularensis Strains Belonging to Low and High Virulence Subspecies , 2009, PLoS pathogens.

[111]  Adeline R. Whitney,et al.  Decreased necrotizing fasciitis capacity caused by a single nucleotide mutation that alters a multiple gene virulence axis , 2010, Proceedings of the National Academy of Sciences.

[112]  Benjamin L. King,et al.  Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori , 1999, Nature.

[113]  Giovanna Morelli,et al.  Phylogenetic diversity and historical patterns of pandemic spread of Yersinia pestis , 2010, Nature Genetics.

[114]  Scott N Peterson,et al.  Whole genome single nucleotide polymorphism based phylogeny of Francisella tularensis and its application to the development of a strain typing assay , 2009, BMC Microbiology.

[115]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[116]  D. Falush,et al.  Helicobacter pylori genome evolution during human infection , 2011, Proceedings of the National Academy of Sciences.

[117]  T. Patterson,et al.  Characterization of wild lambdoid bacteriophages: detection of a wide distribution of phage immunity groups and identification of a nus-dependent, nonlambdoid phage group. , 1999, Virology.

[118]  Jeffrey R. Robinson,et al.  Genome Degradation in Brucella ovis Corresponds with Narrowing of Its Host Range and Tissue Tropism , 2009, PloS one.

[119]  Cecilia Dahlberg,et al.  Amelioration of the cost of conjugative plasmid carriage in Eschericha coli K12. , 2003, Genetics.

[120]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[121]  D. Graham,et al.  The Peopling of the Pacific from a Bacterial Perspective , 2009, Science.

[122]  Fiona S. L. Brinkman,et al.  Evaluation of genomic island predictors using a comparative genomics approach , 2008, BMC Bioinformatics.

[123]  R. Siezen,et al.  The tempo and mode of molecular evolution of Mycobacterium tuberculosis at patient-to-patient scale. , 2010, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[124]  Brian D. Ondov,et al.  Efficient mapping of Applied Biosystems SOLiD sequence data to a reference genome for functional genomic applications , 2008, Bioinform..

[125]  R. Wilson,et al.  Complete genome sequence of Salmonella enterica serovar Typhimurium LT2 , 2001, Nature.

[126]  F. Martín-Sánchez,et al.  Analysis of the genome content of Lactococcus garvieae by genomic interspecies microarray hybridization , 2010, BMC Microbiology.

[127]  M. Watarai,et al.  Distinctiveness of the genomic sequence of Shiga toxin 2-converting phage isolated from Escherichia coli O157:H7 Okayama strain as compared to other Shiga toxin 2-converting phages. , 2003, Gene.

[128]  K. Konstantinidis,et al.  Toward a More Robust Assessment of Intraspecies Diversity, Using Fewer Genetic Markers , 2006, Applied and Environmental Microbiology.

[129]  Zixin Deng,et al.  mGenomeSubtractor: a web-based tool for parallel in silico subtractive hybridization analysis of multiple bacterial genomes , 2010, Nucleic Acids Res..

[130]  Marcus Droege,et al.  The Genome Sequencer FLX System--longer reads, more applications, straight forward bioinformatics and more complete data sets. , 2008, Journal of biotechnology.

[131]  Samuel I. Miller,et al.  Bacteriophages in the evolution of pathogen-host interactions. , 1999, Proceedings of the National Academy of Sciences of the United States of America.