chewBBACA: A complete suite for gene-by-gene schema creation and strain identification

Gene-by-gene approaches are becoming increasingly popular in bacterial genomic epidemiology and outbreak detection. However, there is a lack of open-source scalable software for schema definition and allele calling for these methodologies. The chewBBACA suite was designed to assist users in the creation and evaluation of novel whole-genome or core-genome gene-by-gene typing schemas and subsequent allele calling in bacterial strains of interest. chewBBACA performs the schema creation and allele calls on complete or draft genomes resulting from de novo assemblers. The chewBBACA software uses Python 3.4 or higher and can run on a laptop or in high performance clusters making it useful for both small laboratories and large reference centers. ChewBBACA is available at https://github.com/B-UMMI/chewBBACA.

[1]  I. Van Walle,et al.  PulseNet International: Vision for the implementation of whole genome sequencing (WGS) for global food-borne disease surveillance , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[2]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[3]  Martin C. J. Maiden,et al.  BIGSdb: Scalable analysis of bacterial genome variation at the population level , 2010, BMC Bioinformatics.

[4]  Dag Harmsen,et al.  Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Listeria monocytogenes , 2015, Journal of Clinical Microbiology.

[5]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[6]  Jacob Moran-Gilad,et al.  Whole genome sequencing (WGS) for food-borne pathogen surveillance and control – taking the pulse , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[7]  Jacques Ravel,et al.  Visualization of comparative genomic analyses by BLAST score ratio , 2005, BMC Bioinformatics.

[8]  Cedric Chauve,et al.  MentaLiST – A fast MLST caller for large MLST schemes , 2017, bioRxiv.

[9]  Gary Van Domselaar,et al.  A Primer on Infectious Disease Bacterial Genomics , 2016, Clinical Microbiology Reviews.

[10]  Mirko Rossi,et al.  Refinement of Whole-Genome Multilocus Sequence Typing Analysis by Addressing Gene Paralogy , 2015, Journal of Clinical Microbiology.

[11]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[12]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[13]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[14]  M. Achtman,et al.  Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  Eduardo P C Rocha,et al.  Whole genome-based population biology and epidemiological surveillance of Listeria monocytogenes , 2016, Nature Microbiology.