The Integrated Rapid Infectious Disease Analysis (IRIDA) Platform

Whole genome sequencing (WGS) is a powerful tool for public health infectious disease investigations owing to its higher resolution, greater efficiency, and cost-effectiveness over traditional genotyping methods. Implementation of WGS in routine public health microbiology laboratories is impeded by a lack of user-friendly automated and semi-automated pipelines, restrictive jurisdictional data sharing policies, and the proliferation of non-interoperable analytical and reporting systems. To address these issues, we developed the Integrated Rapid Infectious Disease Analysis (IRIDA) platform (irida.ca), a user-friendly, decentralized, open-source bioinformatics and analytical web platform to support real-time infectious disease outbreak investigations using WGS data. Instances can be independently installed on local high-performance computing infrastructure, enabling private and secure data management and analyses according to organizational policies and governance. IRIDA’s data management capabilities enable secure upload, storage and sharing of all WGS data and metadata. The core platform currently includes pipelines for quality control, assembly, annotation, variant detection, phylogenetic analysis, in silico serotyping, multi-locus sequence typing, and genome distance calculation. Analysis pipeline results can be visualized within the platform through dynamic line lists and integrated phylogenomic clustering for research and discovery, and for enhancing decision-making support and hypothesis generation in epidemiological investigations. Communication and data exchange between instances are provided through customizable access controls. IRIDA complements centralized systems, empowering local analytics and visualizations for genomics-based microbial pathogen investigations. IRIDA is currently transforming the Canadian public health ecosystem and is freely available at https://github.com/phac-nml/irida and www.irida.ca. Impact Statement Whole genome sequencing (WGS) is revolutionizing infectious disease analysis and surveillance due to its cost effectiveness, utility, and improved analytical power. To date, no “one-size-fits-all” genomics platform has been universally adopted, owing to differences in national (and regional) health information systems, data sharing policies, computational infrastructures, lack of interoperability and prohibitive costs. The Integrated Rapid Infectious Disease Analysis (IRIDA) platform is a user-friendly, decentralized, open-source bioinformatics and analytical web platform developed to support real-time infectious disease outbreak investigations using WGS data. IRIDA empowers public health, regulatory and clinical microbiology laboratory personnel to better incorporate WGS technology into routine operations by shielding them from the computational and analytical complexities of big data genomics. IRIDA is now routinely used as part of a validated suite of tools to support outbreak investigations in Canada. While IRIDA was designed to serve the needs of the Canadian public health system, it is generally applicable to any public health and multi-jurisdictional environment. IRIDA enables localized analyses but provides mechanisms and standard outputs to enable data sharing. This approach can help overcome pervasive challenges in real-time global infectious disease surveillance, investigation and control, resulting in faster responses, and ultimately, better public health outcomes. DATA SUMMARY Data used to generate some of the figures in this manuscript can be found in the NCBI BioProject PRJNA305824.

[1]  Thomas Schön,et al.  Epidemiological characterization of a nosocomial outbreak of extended spectrum β‐lactamase Escherichia coli ST‐131 confirms the clinical value of core genome multilocus sequence typing , 2017, APMIS : acta pathologica, microbiologica, et immunologica Scandinavica.

[2]  Fiona S. L. Brinkman,et al.  Genotypes Associated with Listeria monocytogenes Isolates Displaying Impaired or Enhanced Tolerances to Cold, Salt, Acid, or Desiccation Stress , 2017, Front. Microbiol..

[3]  Malbert R. C. Rogers,et al.  Core Genome Multilocus Sequence Typing Scheme for High-Resolution Typing of Enterococcus faecium , 2015, Journal of Clinical Microbiology.

[4]  Bernadette A. Thomas,et al.  Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010 , 2012, The Lancet.

[5]  Frank M. Aarestrup,et al.  Rapid and Easy In Silico Serotyping of Escherichia coli Isolates by Use of Whole-Genome Sequencing Data , 2015, Journal of Clinical Microbiology.

[6]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[7]  Matthew Walker,et al.  The Validation and Implications of Using Whole Genome Sequencing as a Replacement for Traditional Serotyping for a National Salmonella Reference Laboratory , 2017, Front. Microbiol..

[8]  Hannes Pouseele,et al.  Implementation of Whole Genome Sequencing (WGS) for Identification and Characterization of Shiga Toxin-Producing Escherichia coli (STEC) in the United States , 2016, Front. Microbiol..

[9]  I. Van Walle,et al.  PulseNet International: Vision for the implementation of whole genome sequencing (WGS) for global food-borne disease surveillance , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[10]  N. McCallum,et al.  Whole genome sequencing in clinical and public health microbiology , 2015, Pathology.

[11]  Raymond Lo,et al.  CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database , 2016, Nucleic Acids Res..

[12]  Eduardo N. Taboada,et al.  Comprehensive assessment of the quality of Salmonella whole genome sequence data available in public sequence databases using the Salmonella in silico Typing Resource (SISTR) , 2018, Microbial genomics.

[13]  Eduardo N. Taboada,et al.  The Salmonella In Silico Typing Resource (SISTR): An Open Web-Accessible Tool for Rapidly Typing and Subtyping Draft Salmonella Genome Assemblies , 2016, PloS one.

[14]  Richard J Ellis,et al.  Whole-genome sequencing for national surveillance of Shiga toxin-producing Escherichia coli O157. , 2015, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[15]  B. Byl,et al.  Pan-genome multilocus sequence typing and outbreak-specific reference-based single nucleotide polymorphism analysis to resolve two concurrent Staphylococcus aureus outbreaks in neonatal services. , 2016, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[16]  Lars-Göran Johansson,et al.  On Scientific Data , 2016 .

[17]  Steven Salzberg,et al.  BIOINFORMATICS ORIGINAL PAPER , 2004 .

[18]  Henk C den Bakker,et al.  Genomic Epidemiology: Whole-Genome-Sequencing-Powered Surveillance and Outbreak Investigation of Foodborne Bacterial Pathogens. , 2016, Annual review of food science and technology.

[19]  Brian D. Ondov,et al.  Mash: fast genome and metagenome distance estimation using MinHash , 2015, Genome Biology.

[20]  Richard Myers,et al.  SnapperDB: A database solution for routine sequencing analysis of bacterial isolates , 2017, bioRxiv.

[21]  Andrew Lonie,et al.  Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud , 2015, PloS one.

[22]  Robert G. Beiko,et al.  SNVPhyl: a single nucleotide variant phylogenomics pipeline for microbial genomic epidemiology , 2016, bioRxiv.

[23]  Weida Tong,et al.  Baseline Practices for the Application of Genomic Data Supporting Regulatory Food Safety. , 2017, Journal of AOAC International.

[24]  Alejandro Amézquita,et al.  Next generation microbiological risk assessment: opportunities of whole genome sequencing (WGS) for foodborne pathogen surveillance, source tracking and risk assessment. , 2017, International journal of food microbiology.

[25]  Jianghong Meng,et al.  Emerging and evolving microbial foodborne pathogens , 1998 .

[26]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[27]  F M Aarestrup,et al.  Developing a framework to assess the costeffectiveness of COMPARE - a global platform for the exchange of sequence-based pathogen data. , 2017, Revue scientifique et technique.

[28]  C. Carrillo,et al.  Comparative Evaluation of Genomic and Laboratory Approaches for Determination of Shiga Toxin Subtypes in Escherichia coli. , 2016, Journal of food protection.

[29]  G. Domselaar,et al.  Usefulness of High-Quality Core Genome Single-Nucleotide Variant Analysis for Subtyping the Highly Clonal and the Most Prevalent Salmonella enterica Serovar Heidelberg Clone in the Context of Outbreak Investigations , 2015, Journal of Clinical Microbiology.

[30]  Ole Lund,et al.  Genotyping using whole-genome sequencing is a realistic alternative to surveillance based on phenotypic antimicrobial susceptibility testing. , 2013, The Journal of antimicrobial chemotherapy.

[31]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[32]  Gary Van Domselaar,et al.  A Comparative Analysis of the Lyve-SET Phylogenomics Pipeline for Genomic Epidemiology of Foodborne Pathogens , 2017, Front. Microbiol..

[33]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[34]  Neil Woodford,et al.  Prediction of Phenotypic Antimicrobial Resistance Profiles From Whole Genome Sequences of Non-typhoidal Salmonella enterica , 2018, Front. Microbiol..

[35]  Emma Griffiths,et al.  Context Is Everything: Harmonization of Critical Food Microbiology Descriptors and Metadata for Improved Food Safety and Surveillance , 2017, Front. Microbiol..

[36]  Ruth Timme,et al.  Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database , 2016, Journal of Clinical Microbiology.

[37]  Elena A Oniciuc,et al.  The Present and Future of Whole Genome Sequencing (WGS) and Whole Metagenome Sequencing (WMS) for Surveillance of Antimicrobial Resistant Microorganisms and Antimicrobial Resistance Genes across the Food Chain , 2018, Genes.

[38]  Eric Fournier,et al.  Impact of the choice of reference genome on the ability of the core genome SNV methodology to distinguish strains of Salmonella enterica serovar Heidelberg , 2018, PloS one.

[39]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[40]  Ole Lund,et al.  A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance , 2016, PloS one.

[41]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[42]  C. Beard,et al.  Whole genome multilocus sequence typing as an epidemiologic tool for Yersinia pestis. , 2016, Diagnostic microbiology and infectious disease.

[43]  Matthew R. Laird,et al.  IslandViewer 4: expanded prediction of genomic islands for larger-scale datasets , 2017, Nucleic Acids Res..

[44]  Ruth E. Timme,et al.  Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance , 2017, PeerJ.

[45]  Birgit Funke,et al.  College of American Pathologists' laboratory standards for next-generation sequencing clinical tests. , 2015, Archives of pathology & laboratory medicine.

[46]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[47]  Steven J. M. Jones,et al.  Genomic Analysis of a Serotype 5 Streptococcus pneumoniae Outbreak in British Columbia, Canada, 2005–2009 , 2016, The Canadian journal of infectious diseases & medical microbiology = Journal canadien des maladies infectieuses et de la microbiologie medicale.

[48]  Stefan Niemann,et al.  Whole-Genome-Based Mycobacterium tuberculosis Surveillance: a Standardized, Portable, and Expandable Approach , 2014, Journal of Clinical Microbiology.

[49]  Donovan H. Parks,et al.  GenGIS 2: Geospatial Analysis of Traditional and Genetic Biodiversity, with New Gradient Algorithms and an Extensible Plugin Framework , 2013, PloS one.

[50]  Olivier Bodenreider,et al.  Bio-ontologies: current trends and future directions , 2006, Briefings Bioinform..

[51]  Cedric Chauve,et al.  MentaLiST – A fast MLST caller for large MLST schemes , 2017, bioRxiv.

[52]  Michael Eisenstein,et al.  Big data: The power of petabytes , 2015, Nature.

[53]  Peter Ndeboc Fonkwo Pricing infectious disease , 2008, EMBO reports.

[54]  Matthew D. Whiteside,et al.  Phylotyper: in silico predictor of gene subtypes , 2017, Bioinform..

[55]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[56]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[57]  John Crandall,et al.  Validation and Implementation of Clinical Laboratory Improvements Act-Compliant Whole-Genome Sequencing in the Public Health Microbiology Laboratory , 2017, Journal of Clinical Microbiology.

[58]  Michael Inouye,et al.  In silico serotyping of E. coli from short read data identifies limited novel O-loci but extensive diversity of O:H serotype combinations within and between pathogenic lineages , 2016, Microbial genomics.