Sequencing of animal viruses: quality data assurance for NGS bioinformatics

BackgroundNext generation sequencing (NGS) is becoming widely used among diagnostics and research laboratories, and nowadays it is applied to a variety of disciplines, including veterinary virology. The NGS workflow comprises several steps, namely sample processing, library preparation, sequencing and primary/secondary/tertiary bioinformatics (BI) analyses. The latter is constituted by a complex process extremely difficult to standardize, due to the variety of tools and metrics available. Thus, it is of the utmost importance to assess the comparability of results obtained through different methods and in different laboratories. To achieve this goal, we have organized a proficiency test focused on the bioinformatics components for the generation of complete genome sequences of salmonid rhabdoviruses.MethodsThree partners, that performed virus sequencing using different commercial library preparation kits and NGS platforms, gathered together and shared with each other 75 raw datasets which were analyzed separately by the participants to produce a consensus sequence according to their own bioinformatics pipeline. Results were then compared to highlight discrepancies, and a subset of inconsistencies were investigated more in detail.ResultsIn total, we observed 526 discrepancies, of which 39.5% were located at genome termini, 14.1% at intergenic regions and 46.4% at coding regions. Among these, 10 SNPs and 99 indels caused changes in the protein products. Overall reproducibility was 99.94%. Based on the analysis of a subset of inconsistencies investigated more in-depth, manual curation appeared the most critical step affecting sequence comparability, suggesting that the harmonization of this phase is crucial to obtain comparable results. The analysis of a calibrator sample allowed assessing BI accuracy, being 99.983%.ConclusionsWe demonstrated the applicability and the usefulness of BI proficiency testing to assure the quality of NGS data, and recommend a wider implementation of such exercises to guarantee sequence data uniformity among different virology laboratories.

[1]  Paul D. Shaw,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[2]  J Moran-Gilad,et al.  Practical issues in implementing whole-genome-sequencing in routine diagnostic microbiology. , 2017, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[3]  F. Granberg,et al.  Next-generation sequencing workflows in veterinary infection biology: towards validation and quality assurance. , 2016, Revue scientifique et technique.

[4]  F. Granberg,et al.  Novel technologies applied to the nucleotide sequencing and comparative sequence analysis of the genomes of infectious agents in veterinary medicine. , 2016, Revue scientifique et technique.

[5]  R. Orton,et al.  Next-Generation Sequencing in Veterinary Medicine: How Can the Massive Amount of Information Arising from High-Throughput Technologies Improve Diagnosis, Control, and Management of Infectious Diseases? , 2014, Methods in molecular biology.

[6]  Rakesh Nagarajan,et al.  Proficiency Testing of Standardized Samples Shows Very High Interlaboratory Agreement for Clinical Next-Generation Sequencing-Based Oncology Assays. , 2018, Archives of pathology & laboratory medicine.

[7]  Adam Zemla,et al.  The Role of Viral Population Diversity in Adaptation of Bovine Coronavirus to New Host Environments , 2013, PloS one.

[8]  The proficiency test (pilot) report of the global microbial identifier (GMI) initiative, year 2014 , 2016 .

[9]  Marc Eloit,et al.  The diagnosis of infectious diseases by whole genome next generation sequencing: a new era is opening , 2014, Front. Cell. Infect. Microbiol..

[10]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[11]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[12]  A. Wilm,et al.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets , 2012, Nucleic acids research.

[13]  Marina N Nikiforova,et al.  Guidelines for Validation of Next-Generation Sequencing-Based Oncology Panels: A Joint Consensus Recommendation of the Association for Molecular Pathology and College of American Pathologists. , 2017, The Journal of molecular diagnostics : JMD.

[14]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[15]  Jonathan E. Allen,et al.  Ultra-Deep Sequencing of Intra-host Rabies Virus Populations during Cross-species Transmission , 2013, PLoS neglected tropical diseases.

[16]  G. Reyes-Terán,et al.  Deep sequencing: becoming a critical tool in clinical virology. , 2014, Journal of clinical virology : the official publication of the Pan American Society for Clinical Virology.

[17]  Kai Zhou,et al.  Application of next generation sequencing in clinical microbiology and infection prevention. , 2017, Journal of biotechnology.

[18]  Joshua L. Deignan,et al.  ACMG clinical laboratory standards for next-generation sequencing , 2013, Genetics in Medicine.

[19]  Shashikant Kulkarni,et al.  Good laboratory practice for clinical next-generation sequencing informatics pipelines , 2015, Nature Biotechnology.

[20]  Urmila Kulkarni-Kale,et al.  Analysis of Next-generation Sequencing Data in Virology - Opportunities and Challenges , 2016 .

[21]  S. Belák,et al.  High-throughput sequencing in veterinary infection biology and diagnostics. , 2013, Revue scientifique et technique.

[22]  H. Prado,et al.  Advances and Applications , 2010 .

[23]  B. Guldbrandtsen,et al.  Ultra-deep sequencing of VHSV isolates contributes to understanding the role of viral quasispecies , 2016, Veterinary Research.

[24]  S. Chilmonczyk,et al.  Some properties of the Epithelioma papulosum cyprini (EPC) cell line from carp cyprinus carpio , 1983, Annales de l'Institut Pasteur / Virologie.

[25]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[26]  Martin Beer,et al.  Proficiency Testing of Virus Diagnostics Based on Bioinformatics Analysis of Simulated In Silico High-Throughput Sequencing Data Sets , 2019, Journal of Clinical Microbiology.

[27]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[28]  John D Pfeifer,et al.  A Model Study of In Silico Proficiency Testing for Clinical Next-Generation Sequencing. , 2016, Archives of pathology & laboratory medicine.

[29]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[30]  Thomas Wetter,et al.  Genome Sequence Assembly Using Trace Signals and Additional Sequence Information , 1999, German Conference on Bioinformatics.

[31]  T. Dallman,et al.  Performance comparison of benchtop high-throughput sequencing platforms , 2012, Nature Biotechnology.

[32]  J. T. Dunnen,et al.  Next generation sequencing technology: Advances and applications. , 2014, Biochimica et biophysica acta.

[33]  Karl Kashofer,et al.  Multi-laboratory proficiency testing of clinical cancer genomic profiling by next-generation sequencing. , 2018, Pathology, research and practice.

[34]  Dahui Qin,et al.  Multi-Institutional FASTQ File Exchange as a Means of Proficiency Testing for Next-Generation Sequencing Bioinformatics and Variant Interpretation. , 2016, The Journal of molecular diagnostics : JMD.

[35]  B. Lambrecht,et al.  Next-generation sequencing shows West Nile virus quasispecies diversification after a single passage in a carrion crow (Corvus corone) in vivo infection model. , 2015, The Journal of general virology.

[36]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[37]  Alexis B. Carter,et al.  Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists. , 2018, The Journal of molecular diagnostics : JMD.

[38]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[39]  A. Friedrich,et al.  Reprint of "Application of next generation sequencing in clinical microbiology and infection prevention". , 2017, Journal of biotechnology.

[40]  K. Wolf,et al.  Established Eurythermic Line of Fish Cells in vitro , 1962, Science.

[41]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[42]  N. Taylor,et al.  Isolation of a Chinook Salmon Bafinivirus (CSBV) in Imported Goldfish Carassius auratus L. in the United Kingdom and Evaluation of Its Virulence in Resident Fish Species , 2020, Viruses.

[43]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[44]  Marshall Crumiller,et al.  Influenza A virus transmission bottlenecks are defined by infection route and recipient host. , 2014, Cell host & microbe.

[45]  Gavin R. Oliver,et al.  Bioinformatics for clinical next generation sequencing. , 2015, Clinical chemistry.

[46]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[47]  Carolyn Sue Richards,et al.  Methods-based proficiency testing in molecular genetic pathology. , 2014, The Journal of molecular diagnostics : JMD.

[48]  John Crandall,et al.  Validation and Implementation of Clinical Laboratory Improvements Act-Compliant Whole-Genome Sequencing in the Public Health Microbiology Laboratory , 2017, Journal of Clinical Microbiology.

[49]  Lisa Kalman,et al.  Assuring the Quality of Next-Generation Sequencing in Clinical Microbiology and Public Health Laboratories , 2016, Journal of Clinical Microbiology.

[50]  Birgit Funke,et al.  College of American Pathologists' laboratory standards for next-generation sequencing clinical tests. , 2015, Archives of pathology & laboratory medicine.

[51]  Dmitry Antipov,et al.  Assembling Single-Cell Genomes and Mini-Metagenomes From Chimeric MDA Products , 2013, J. Comput. Biol..

[52]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[53]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[54]  M. Snyder,et al.  High-throughput sequencing technologies. , 2015, Molecular cell.

[55]  E. Mérour,et al.  Limited Interference at the Early Stage of Infection between Two Recombinant Novirhabdoviruses: Viral Hemorrhagic Septicemia Virus and Infectious Hematopoietic Necrosis Virus , 2010, Journal of Virology.

[56]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.