The Salmonella In Silico Typing Resource (SISTR): An Open Web-Accessible Tool for Rapidly Typing and Subtyping Draft Salmonella Genome Assemblies

For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/.

[1]  B. Korczak,et al.  Fast DNA Serotyping of Escherichia coli by Use of an Oligonucleotide Microarray , 2006, Journal of Clinical Microbiology.

[2]  E. Lingohr,et al.  Multi-laboratory evaluation of the rapid genoserotyping array (SGSA) for the identification of Salmonella serovars. , 2014, Diagnostic microbiology and infectious disease.

[3]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[4]  R. Parreñas,et al.  Sequencing and Comparative Analysis of Flagellin Genes fliC, fljB, and flpA from Salmonella , 2004, Journal of Clinical Microbiology.

[5]  M. Bissell Multiplex, Bead-Based Suspension Array for Molecular Determination of Common Salmonella Serogroups , 2009 .

[6]  R. Kaas,et al.  Solving the Problem of Comparing Whole Bacterial Genomes across Different Sequencing Platforms , 2014, PloS one.

[7]  Martin C. J. Maiden,et al.  mlstdbNet – distributed multi-locus sequence typing (MLST) databases , 2004, BMC Bioinformatics.

[8]  James H. Jorgensen,et al.  Manual of Clinical Microbiology, 11th Edition , 2015 .

[9]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[10]  Victor P J Gannon,et al.  Everything at once: comparative analysis of the genomes of bacterial pathogens. , 2011, Veterinary microbiology.

[11]  B. Swaminathan,et al.  PulseNet: the molecular subtyping network for foodborne bacterial disease surveillance, United States. , 2001, Emerging infectious diseases.

[12]  G. Dougan,et al.  Routine Use of Microbial Whole Genome Sequencing in Diagnostic and Public Health Microbiology , 2012, PLoS pathogens.

[13]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[14]  S. Gharbia,et al.  Flagellin gene sequence evolution in Salmonella. , 2007, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[15]  Zhemin Zhou,et al.  Multilocus Sequence Typing as a Replacement for Serotyping in Salmonella enterica , 2012, PLoS pathogens.

[16]  W. M. Dunne,et al.  Next-generation and whole-genome sequencing in the diagnostic clinical microbiology laboratory , 2012, European Journal of Clinical Microbiology & Infectious Diseases.

[17]  Keith A. Jolley,et al.  Ribosomal multilocus sequence typing: universal characterization of bacteria from domain to strain , 2012, Microbiology.

[18]  M. Gilmour,et al.  Public Health Genomics and the New Molecular Epidemiology of Bacterial Pathogens , 2013, Public Health Genomics.

[19]  K. Balakrishna,et al.  Detection of Salmonella enterica serovar Typhi (S. Typhi) by selective amplification of invA, viaB, fliC‐d and prt genes by polymerase chain reaction in mutiplex format , 2006, Letters in applied microbiology.

[20]  M. Sánchez-Jiménez,et al.  Development and evaluation of a multiplex real-time polymerase chain reaction procedure to clinically type prevalent Salmonella enterica serovars. , 2010, The Journal of molecular diagnostics : JMD.

[21]  E. J. Threlfall,et al.  Development of a Multiplex Primer Extension Assay for Rapid Detection of Salmonella Isolates of Diverse Serotypes , 2010, Journal of Clinical Microbiology.

[22]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[23]  M. Anjum,et al.  Rapid Genoserotyping Tool for Classification of Salmonella Serovars , 2011, Journal of Clinical Microbiology.

[24]  J. Bray,et al.  MLST revisited: the gene-by-gene approach to bacterial genomics , 2013, Nature Reviews Microbiology.

[25]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[26]  M. Struelens,et al.  From molecular to genomic epidemiology: transforming surveillance and control of infectious diseases. , 2013, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[27]  J. Fierer,et al.  Diverse virulence traits underlying different clinical outcomes of Salmonella infection. , 2001, The Journal of clinical investigation.

[28]  P. Fields,et al.  Multiplex, Bead-Based Suspension Array for Molecular Determination of Common Salmonella Serogroups , 2007, Journal of Clinical Microbiology.

[29]  T. Joys,et al.  Molecular analyses of the Salmonella g. . . flagellar antigen complex , 1993, Journal of bacteriology.

[30]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[31]  Eduardo N. Taboada,et al.  MIST: A Tool for Rapid in silico Generation of Molecular Data from Bacterial Genome Sequences , 2013, BIOINFORMATICS.

[32]  P. Wattiau,et al.  Evaluation of the Premi Test Salmonella, a commercial low-density DNA microarray system intended for routine identification and typing of Salmonella enterica. , 2008, International journal of food microbiology.

[33]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[34]  F. Weill,et al.  WHO Collaborating Centre for Reference and Research on Salmonella ANTIGENIC FORMULAE OF THE SALMONELLA SEROVARS , 2007 .

[35]  P. Fields,et al.  Methodologies towards the development of an oligonucleotide microarray for determination of Salmonella serotypes. , 2007, Journal of microbiological methods.

[36]  Keith A. Jolley,et al.  A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter , 2012, Genes.

[37]  N. Stralis-Pavese,et al.  Development of an oligonucleotide microarray method for Salmonella serotyping , 2008, Microbial biotechnology.

[38]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[39]  Frank M. Aarestrup,et al.  Rapid and Easy In Silico Serotyping of Escherichia coli Isolates by Use of Whole-Genome Sequencing Data , 2015, Journal of Clinical Microbiology.

[40]  M. Achtman,et al.  Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Yanlong Yin,et al.  Salmonella Serotype Determination Utilizing High-Throughput Genome Sequencing Data , 2015, Journal of Clinical Microbiology.

[42]  Justin Zobel,et al.  SRST2: Rapid genomic surveillance for public health and hospital microbiology labs , 2014, bioRxiv.

[43]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[44]  R. Wollin A study of invasiveness of different Salmonella serovars based on analysis of the Enter-net database. , 2007, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[45]  P. Fields,et al.  Molecular Determination of H Antigens of Salmonella by Use of a Microsphere-Based Liquid Array , 2010, Journal of Clinical Microbiology.