A Practical Bioinformatics Workflow for Routine Analysis of Bacterial WGS Data

The use of whole-genome sequencing (WGS) for bacterial characterisation has increased substantially in the last decade. Its high throughput and decreasing cost have led to significant changes in outbreak investigations and surveillance of a wide variety of microbial pathogens. Despite the innumerable advantages of WGS, several drawbacks concerning data analysis and management, as well as a general lack of standardisation, hinder its integration in routine use. In this work, a bioinformatics workflow for (Illumina) WGS data is presented for bacterial characterisation including genome annotation, species identification, serotype prediction, antimicrobial resistance prediction, virulence-related genes and plasmid replicon detection, core-genome-based or single nucleotide polymorphism (SNP)-based phylogenetic clustering and sequence typing. Workflow was tested using a collection of 22 in-house sequences of Salmonella enterica isolates belonging to a local outbreak, coupled with a collection of 182 Salmonella genomes publicly available. No errors were reported during the execution period, and all genomes were analysed. The bioinformatics workflow can be tailored to other pathogens of interest and is freely available for academic and non-profit use as an uploadable file to the Galaxy platform.

[1]  Jessica C. Chen,et al.  Correlation between Phenotypic and In Silico Detection of Antimicrobial Resistance in Salmonella enterica in Canada Using Staramr , 2022, Microorganisms.

[2]  A. Shelenkov,et al.  Comparative Whole-Genome Analysis of Russian Foodborne Multidrug-Resistant Salmonella Infantis Isolates , 2021, Microorganisms.

[3]  E. Yap,et al.  GALAXY Workflow for Bacterial Next‐Generation Sequencing De Novo Assembly and Annotation , 2021, Current protocols.

[4]  G. Mboowa,et al.  rMAP: the Rapid Microbial Analysis Pipeline for ESKAPE bacterial group whole-genome sequence data , 2021, Microbial genomics.

[5]  Simon H. Tausch,et al.  Species-Specific Quality Control, Assembly and Contamination Detection in Microbial Isolate Sequences with AQUAMIS , 2021, Genes.

[6]  Sudhir Kumar,et al.  MEGA11: Molecular Evolutionary Genetics Analysis Version 11 , 2021, Molecular biology and evolution.

[7]  P. Bork,et al.  Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation , 2021, Nucleic Acids Res..

[8]  Yan Liu,et al.  Genomic Characterization of Salmonella enterica Isolates From Retail Meat in Beijing, China , 2021, Frontiers in Microbiology.

[9]  K. Marchal,et al.  A Bioinformatics Whole-Genome Sequencing Workflow for Clinical Mycobacterium tuberculosis Complex Isolate Analysis, Validated Using a Reference Collection Extensively Characterized with Conventional Methods and In Silico Approaches , 2021, Journal of Clinical Microbiology.

[10]  K. Marchal,et al.  Validation strategy of a bioinformatics whole genome sequencing workflow for Shiga toxin-producing Escherichia coli using a reference collection extensively characterized with conventional methods , 2021, Microbial Genomics.

[11]  J. Gangiredla,et al.  GalaxyTrakr: a distributed analysis tool for public health whole genome sequence data accessible to non-bioinformaticians , 2021, BMC genomics.

[12]  R. Kaas,et al.  ResFinder 4.0 for predictions of phenotypes from genotypes , 2020, The Journal of antimicrobial chemotherapy.

[13]  Anton Nekrutenko,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update , 2020, Nucleic Acids Res..

[14]  Alexander Goesmann,et al.  ASA3P: An automatic and scalable pipeline for the assembly, annotation and higher-level analysis of closely related bacterial isolates , 2020, PLoS computational biology.

[15]  R. A. Petit,et al.  Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes , 2020, mSystems.

[16]  M. Achtman,et al.  The EnteroBase user's guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity , 2019, Genome research.

[17]  David Rodríguez-Lázaro,et al.  TORMES: an automated pipeline for whole bacterial genome analysis , 2019, Bioinform..

[18]  Jennifer Lu,et al.  Improved metagenomic analysis with Kraken 2 , 2019, Genome Biology.

[19]  P. Gerner-Smidt,et al.  Use of Whole-Genome Sequencing for Food Safety and Public Health in the United States , 2019, Foodborne pathogens and disease.

[20]  P. Gerner-Smidt,et al.  Whole Genome Sequencing: Bridging One-Health Surveillance of Foodborne Diseases , 2019, Front. Public Health.

[21]  Dominique Gorse,et al.  MetaDEGalaxy: Galaxy workflow for differential abundance analysis of 16s metagenomic data , 2019, F1000Research.

[22]  Kevin Vanneste,et al.  Validation of a Bioinformatics Workflow for Routine Analysis of Whole-Genome Sequencing Data and Related Challenges for Pathogen Typing in a European National Reference Center: Neisseria meningitidis as a Proof-of-Concept , 2019, Front. Microbiol..

[23]  A. Pielaat,et al.  Validation of EN ISO 6579-1 - Microbiology of the food chain - Horizontal method for the detection, enumeration and serotyping of Salmonella - Part 1 detection of Salmonella spp. , 2019, International journal of food microbiology.

[24]  Diogo N. Silva,et al.  INNUENDO: A cross‐sectoral platform for the integration of genomics in the surveillance of food‐borne pathogens , 2018, EFSA Supporting Publications.

[25]  Hugh Rand,et al.  Interpreting Whole-Genome Sequence Analyses of Foodborne Bacteria for Regulatory Applications and Outbreak Investigations , 2018, Front. Microbiol..

[26]  Alessandra Longo,et al.  Final report of ENGAGE ‐ Establishing Next Generation sequencing Ability for Genomic analysis in Europe , 2018, EFSA Supporting Publications.

[27]  Nabil-Fareed Alikhan,et al.  Comparative analysis of core genome MLST and SNP typing within a European Salmonella serovar Enteritidis outbreak , 2018, International journal of food microbiology.

[28]  J A Carriço,et al.  A primer on microbial bioinformatics for nonbioinformaticians. , 2018, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[29]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[30]  Anil S. Thanki,et al.  GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline , 2018, GigaScience.

[31]  Padmini Ramachandran,et al.  Genomics of foodborne pathogens for microbial food safety. , 2018, Current opinion in biotechnology.

[32]  Alejandro Amézquita,et al.  Next generation microbiological risk assessment: opportunities of whole genome sequencing (WGS) for foodborne pathogen surveillance, source tracking and risk assessment. , 2017, International journal of food microbiology.

[33]  Rossen,et al.  Practical Issues in Implementing Whole-Genome-Sequencing in Routine Diagnostic Microbiology Genomic and Molecular Diagnostics , 2018 .

[34]  J. Ronholm,et al.  Navigating Microbiological Food Safety in the Era of Whole-Genome Sequencing , 2016, Clinical Microbiology Reviews.

[35]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..

[36]  Alexandre P. Francisco,et al.  PHYLOViZ Online: web-based tool for visualization, phylogenetic inference, analysis and sharing of minimum spanning trees , 2016, Nucleic Acids Res..

[37]  Claire Jenkins,et al.  Identification of Salmonella for public health surveillance using whole genome sequencing , 2016, PeerJ.

[38]  Henk C den Bakker,et al.  Genomic Epidemiology: Whole-Genome-Sequencing-Powered Surveillance and Outbreak Investigation of Foodborne Bacterial Pathogens. , 2016, Annual review of food science and technology.

[39]  Eduardo N. Taboada,et al.  The Salmonella In Silico Typing Resource (SISTR): An Open Web-Accessible Tool for Rapidly Typing and Subtyping Draft Salmonella Genome Assemblies , 2016, PloS one.

[40]  Jian Yang,et al.  VFDB 2016: hierarchical and refined dataset for big data analysis—10 years on , 2015, Nucleic Acids Res..

[41]  Dongryeoul Bae,et al.  Characterization of extended-spectrum β-lactamase (ESBL) producing non-typhoidal Salmonella (NTS) from imported food products. , 2015, International journal of food microbiology.

[42]  Frank M. Aarestrup,et al.  Rapid and Easy In Silico Serotyping of Escherichia coli Isolates by Use of Whole-Genome Sequencing Data , 2015, Journal of Clinical Microbiology.

[43]  Andrew J. Page,et al.  Roary: rapid large-scale prokaryote pan genome analysis , 2015, bioRxiv.

[44]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[45]  James E. Johnson,et al.  Flexible and Accessible Workflows for Improved Proteogenomic Analysis Using the Galaxy Framework , 2014, Journal of proteome research.

[46]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[47]  Ole Lund,et al.  In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing , 2014, Antimicrobial Agents and Chemotherapy.

[48]  Leighton Pritchard,et al.  Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology , 2013, PeerJ.

[49]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[50]  S. Rasmussen,et al.  Identification of acquired antimicrobial resistance genes , 2012, The Journal of antimicrobial chemotherapy.

[51]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[52]  Adam M. Phillippy,et al.  Interactive metagenomic visualization in a Web browser , 2011, BMC Bioinformatics.

[53]  Alessandra Carattoli,et al.  Replicon sequence typing of IncF plasmids carrying virulence and resistance determinants. , 2010, The Journal of antimicrobial chemotherapy.