Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: BactSNP

Bacteria are highly diverse, even within a species; thus, there have been many studies which classify a single species into multiple types and analyze the genetic differences between them. Recently, the use of whole-genome sequencing (WGS) has been popular for these analyses, and the identification of single-nucleotide polymorphisms (SNPs) between isolates is the most basic analysis performed following WGS. The performance of SNP-calling methods therefore has a significant effect on the accuracy of downstream analyses, such as phylogenetic tree inference. In particular, when closely related isolates are analyzed, e.g. in outbreak investigations, some SNP callers tend to detect a high number of false-positive SNPs compared with the limited number of true SNPs among isolates. However, the performances of various SNP callers in such a situation have not been validated sufficiently. Here, we show the results of realistic benchmarks of commonly used SNP callers, revealing that some of them exhibit markedly low accuracy when target isolates are closely related. As an alternative, we developed a novel pipeline BactSNP, which utilizes both assembly and mapping information and is capable of highly accurate and sensitive SNP calling in a single step. BactSNP is also able to call SNPs among isolates when the reference genome is a draft one or even when the user does not input the reference genome. BactSNP is available at https://github.com/IEkAdN/BactSNP.

[1]  H. Chu,et al.  The role of recombination in evolutionary adaptation of Escherichia coli to a novel nutrient , 2017, Journal of evolutionary biology.

[2]  Lorin D. Warnick,et al.  Whole-Genome Sequencing of Drug-Resistant Salmonella enterica Isolates from Dairy Cattle and Humans in New York and Washington States Reveals Source and Geographic Associations , 2017, Applied and Environmental Microbiology.

[3]  J. Parkhill,et al.  Evolution and Epidemiology of Multidrug-Resistant Klebsiella pneumoniae in the United Kingdom and Ireland , 2017, mBio.

[4]  C. Buchrieser,et al.  Multiple major disease-associated clones of Legionella pneumophila have emerged recently and independently , 2016, Genome research.

[5]  Maliha Aziz,et al.  NASP: an accurate, rapid method for the identification of SNPs in WGS datasets that supports flexible input and output formats. , 2016, Microbial genomics.

[6]  Brian T. Tsuji,et al.  Polymyxin Resistance in Acinetobacter baumannii: Genetic Mutations and Transcriptomic Changes in Response to Clinically Relevant Dosage Regimens , 2016, Scientific Reports.

[7]  Y. Teo,et al.  Genetic signatures of Mycobacterium tuberculosis Nonthaburi genotype revealed by whole genome analysis of isolates from tuberculous meningitis patients in Thailand , 2016, PeerJ.

[8]  E. J. McTavish,et al.  TreeToReads - a pipeline for simulating raw reads from phylogenies , 2016, bioRxiv.

[9]  R. Olsen,et al.  A molecular trigger for intercontinental epidemics of group A Streptococcus. , 2015, The Journal of clinical investigation.

[10]  Yan Luo,et al.  CFSAN SNP Pipeline: an automated method for constructing SNP matrices from next-generation sequence data , 2015, PeerJ Comput. Sci..

[11]  L. Hurst,et al.  Genomic analysis of isolates from the United Kingdom 2012 pertussis outbreak reveals that vaccine antigen genes are unusually fast evolving. , 2015, The Journal of infectious diseases.

[12]  M. Wiedmann,et al.  Whole-Genome Sequencing Allows for Improved Identification of Persistent Listeria monocytogenes in Food-Associated Environments , 2015, Applied and Environmental Microbiology.

[13]  A. Buckling,et al.  Coevolution with bacteriophages drives genome-wide host evolution and constrains the acquisition of abiotic-beneficial mutations. , 2015, Molecular biology and evolution.

[14]  Jacqueline A. Keane,et al.  Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins , 2014, Nucleic acids research.

[15]  Julian Parkhill,et al.  Emergence of a New Epidemic Neisseria meningitidis Serogroup A Clone in the African Meningitis Belt: High-Resolution Picture of Genomic Changes That Mediate Immune Evasion , 2014, mBio.

[16]  B. Shapiro,et al.  Evolutionary consequences of intra-patient phage predation on microbial populations , 2014, eLife.

[17]  Tetsuya Hayashi,et al.  Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads , 2014, Genome research.

[18]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[19]  Julian Parkhill,et al.  Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study , 2013, The Lancet. Infectious Diseases.

[20]  Daniel J. Wilson,et al.  Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study , 2013, The Lancet. Infectious diseases.

[21]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[22]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[23]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[24]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[25]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[26]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[27]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[28]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[29]  A. Cheung,et al.  Faculty Opinions recommendation of Evolution of MRSA during hospital transmission and intercontinental spread. , 2010 .

[30]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[31]  B. Hall,et al.  Simulating DNA coding sequence evolution with EvolveAGene 3. , 2008, Molecular biology and evolution.

[32]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[33]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[34]  田中 俊典 National Center for Biotechnology Information (NCBI) , 2012 .