The In Silico Genotyper (ISG): an open-source pipeline to rapidly identify and annotate nucleotide variants for comparative genomics applications

The identification and annotation of nucleotide variants, including insertions/deletions and single nucleotide polymorphisms (SNPs), from whole genome sequence data is important for studies of bacterial evolution, comparative genomics, and phylogeography. The in Silico Genotyper (ISG) represents a parallel, tested, open source tool that can perform these functions and scales well to thousands of bacterial genomes. ISG is written in Java and requires MUMmer (Delcher, et al., 2003), BWA (Li and Durbin, 2009), and GATK (McKenna, et al., 2010) for full functionality. The source code and compiled binaries are freely available from https://github.com/TGenNorth/ISGPipeline under a GNU General Public License. Benchmark comparisons demonstrate that ISG is faster and more flexible than comparable tools.

[1]  D. Sarovich,et al.  SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets , 2014, BMC Research Notes.

[2]  Brian D. Ondov,et al.  Rapid Core-Genome Alignment and Visualization for Thousands of Intraspecific Microbial Genomes , 2014, bioRxiv.

[3]  Barry G. Hall,et al.  When Whole-Genome Alignments Just Won't Work: kSNP v2 Software for Alignment-Free SNP Discovery and Phylogenetics of Hundreds of Microbial Genomes , 2013, PloS one.

[4]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[5]  Huiguang Yi,et al.  Co-phylog: an assembly-free phylogenomic approach for closely related organisms , 2010, Nucleic acids research.

[6]  Ruifu Yang,et al.  Historical variations in mutation rate in an epidemic pathogen, Yersinia pestis , 2012, Proceedings of the National Academy of Sciences.

[7]  Simon Rasmussen,et al.  snpTree - a web-server to identify and construct SNP trees from whole genome sequence data , 2012, BMC Genomics.

[8]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[9]  P. Keim,et al.  Humans and evolutionary and ecological forces shaped the phylogeography of recently emerged diseases , 2009, Nature Reviews Microbiology.

[10]  Scott N Peterson,et al.  Whole genome single nucleotide polymorphism based phylogeny of Francisella tularensis and its application to the development of a strain typing assay , 2009, BMC Microbiology.

[11]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[12]  S. Salzberg,et al.  Using MUMmer to Identify Similar Regions in Large Sequence Sets , 2004 .

[13]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.