Proficiency Testing of Metagenomics-Based Detection of Food-Borne Pathogens Using a Complex Artificial Sequencing Dataset

Metagenomics-based high-throughput sequencing (HTS) enables comprehensive detection of all species comprised in a sample with a single assay and is becoming a standard method for outbreak investigation. However, unlike real-time PCR or serological assays, HTS datasets generated for pathogen detection do not easily provide yes/no answers. Rather, results of the taxonomic read assignment need to be assessed by trained personnel to gain information thereof. Proficiency tests are important instruments of validation, harmonization, and standardization. Within the European Union funded project COMPARE [COllaborative Management Platform for detection and Analyses of (Re-) emerging and foodborne outbreaks in Europe], we conducted a proficiency test to scrutinize the ability to assess diagnostic metagenomics data. An artificial dataset resembling shotgun sequencing of RNA from a sample of contaminated trout was provided to 12 participants with the request to provide a table with per-read taxonomic assignments at species level and a report with a summary and assessment of their findings, considering different categories like pathogen, background, or contaminations. Analysis of the read assignment tables showed that the software used reliably classified the reads taxonomically overall. However, usage of incomplete reference databases or inappropriate data pre-processing caused difficulties. From the combination of the participants’ reports with their read assignments, we conclude that, although most species were detected, a number of important taxa were not or not correctly categorized. This implies that knowledge of and awareness for potentially dangerous species and contaminations need to be improved, hence, capacity building for the interpretation of diagnostic metagenomics datasets is necessary.

[1]  Thomas L. Madden,et al.  Domain enhanced lookup time accelerated BLAST , 2012, Biology Direct.

[2]  Frank Møller Aarestrup,et al.  Genomics-Based Identification of Microorganisms in Human Ocular Body Fluid , 2017 .

[3]  Andreas Andrusch,et al.  PathoLive—Real-Time Pathogen Identification from Metagenomic Illumina Datasets , 2018, bioRxiv.

[4]  Daniel H. Huson,et al.  MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data , 2016, PLoS Comput. Biol..

[5]  Daniel H. Huson,et al.  MALT: Fast alignment and analysis of metagenomic DNA sequence data applied to the Tyrolean Iceman , 2016, bioRxiv.

[6]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[7]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[8]  Simon Andrews,et al.  FastQ Screen: A tool for multi-genome mapping and quality control , 2018, F1000Research.

[9]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[10]  Duy Tin Truong,et al.  Strain-level microbial epidemiology and population genomics from shotgun metagenomics , 2016, Nature Methods.

[11]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[12]  Ole Lund,et al.  MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads , 2017, PloS one.

[13]  Frank M. Aarestrup,et al.  Genomics-Based Identification of Microorganisms in Human Ocular Body Fluid , 2017, bioRxiv.

[14]  T. Mettenleiter,et al.  Metagenomic approaches to identifying infectious agents. , 2016, Revue scientifique et technique.

[15]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[16]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[17]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[18]  Jacques Fellay,et al.  Viral Metagenomics in the Clinical Realm: Lessons Learned from a Swiss-Wide Ring Trial , 2019, Genes.

[19]  Pierrick Lucas,et al.  Sequencing of animal viruses: quality data assurance for NGS bioinformatics , 2019, Virology Journal.

[20]  Martin Beer,et al.  Proficiency Testing of Virus Diagnostics Based on Bioinformatics Analysis of Simulated In Silico High-Throughput Sequencing Data Sets , 2019, Journal of Clinical Microbiology.

[21]  Martin Beer,et al.  RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets , 2015, BMC Bioinformatics.

[22]  Anders Krogh,et al.  Fast and sensitive taxonomic classification for metagenomics with Kaiju , 2016, Nature Communications.

[23]  Martin Beer,et al.  A Versatile Sample Processing Workflow for Metagenomic Pathogen Detection , 2018, Scientific Reports.

[24]  Norman Pavelka,et al.  Advantages of meta-total RNA sequencing (MeTRS) over shotgun metagenomics and amplicon-based sequencing in the profiling of complex microbial communities , 2018, npj Biofilms and Microbiomes.

[25]  Claudio Donati,et al.  MetaMLST: multi-locus strain-level bacterial typing from metagenomic samples , 2016, Nucleic acids research.

[26]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[27]  I. Weissman,et al.  Index switching causes “spreading-of-signal” among multiplexed samples in Illumina HiSeq 4000 DNA sequencing , 2017, bioRxiv.

[28]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[29]  Paul Turner,et al.  Reagent and laboratory contamination can critically impact sequence-based microbiome analyses , 2014, BMC Biology.

[30]  Maarten Nauta,et al.  Whole genome sequencing and metagenomics for outbreak investigation, source attribution and risk assessment of food‐borne microorganisms , 2019, EFSA journal. European Food Safety Authority.