FDA-ARGOS: A Public Quality-Controlled Genome Database Resource for Infectious Disease Sequencing Diagnostics and Regulatory Science Research

Infectious disease next generation sequencing (ID-NGS) diagnostics are on the cusp of revolutionizing the clinical market. To facilitate this transition, FDA proactively invested in tools to support innovation of emerging technologies. FDA and collaborators established a publicly available database, FDA dAtabase for Regulatory-Grade micrObial Sequences (FDA-ARGOS), as a tool to fill reference database gaps with quality-controlled genomes. This manuscript discusses quality control metrics for the proposed FDA-ARGOS genomic resource and outlines the need for quality-controlled genome gap filling in the public domain. Here, we also present three case studies showcasing potential applications for FDA-ARGOS in infectious disease diagnostics, specifically: assay design, reference database and in silico sequence comparison in combination with representative microbial organism wet lab testing; a novel composite validation strategy for ID-NGS diagnostics. The use of FDA-ARGOS as an in silico comparator tool could reduce the burden for completing ID-NGS clinical trials. In addition, use cases identifying Enterococcus avium and Ebola virus (Zaire ebolavirus variant Makona) demonstrate the utility of FDA-ARGOS as a reference database for independent performance validation of new tests and for documenting how one would use this database as an in silico sequence target comparator tool for ID-NGS validation, respectively.

[1]  Robert Schlaberg,et al.  Validation of Metagenomic Next-Generation Sequencing Tests for Universal Pathogen Detection. , 2017, Archives of pathology & laboratory medicine.

[2]  Augustine Goba,et al.  Comprehensive panel of real-time TaqMan polymerase chain reaction assays for detection and absolute quantification of filoviruses, arenaviruses, and New World hantaviruses. , 2010, The American journal of tropical medicine and hygiene.

[3]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[4]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[5]  Lisa Kalman,et al.  Assuring the Quality of Next-Generation Sequencing in Clinical Microbiology and Public Health Laboratories , 2016, Journal of Clinical Microbiology.

[6]  Timothy D. Minogue,et al.  Targeted next-generation sequencing for the detection of ciprofloxacin resistance markers using molecular inversion probes , 2016, Scientific Reports.

[7]  Vanya Gant,et al.  Diagnosis of Neuroinvasive Astrovirus Infection in an Immunocompromised Adult With Encephalitis by Unbiased Next-Generation Sequencing , 2015, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[8]  Shane S. Sturrock,et al.  Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data , 2012, Bioinform..

[9]  Carrie Arnold,et al.  Source code: Putting metagenomics to the test in the clinic , 2017, Nature Medicine.

[10]  Doug Hyatt,et al.  Quality scores for 32,000 genomes , 2014, Standards in genomic sciences.

[11]  G. Weinstock,et al.  Making the Leap from Research Laboratory to Clinic: Challenges and Opportunities for Next-Generation Sequencing in Infectious Disease Diagnostics , 2015, mBio.

[12]  Stephen J. Salipante,et al.  A Year of Infection in the Intensive Care Unit: Prospective Whole Genome Sequencing of Bacterial Clinical Isolates Reveals Cryptic Transmissions and Novel Microbiota , 2015, PLoS genetics.

[13]  William B. Langdon,et al.  Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks , 2015, BioData Mining.

[14]  Q. Zeng,et al.  Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop , 2010, Viruses.

[15]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[16]  Charles Y. Chiu,et al.  Clinical metagenomic identification of Balamuthia mandrillaris encephalitis and assembly of the draft genome: the continuing case for reference genome sequencing , 2015, Genome Medicine.

[17]  I. Nookaew,et al.  Insights from 20 years of bacterial genome sequencing , 2015, Functional & Integrative Genomics.

[18]  M. C. Schatz,et al.  The DNA data deluge , 2013, IEEE Spectrum.

[19]  Noah Alexander,et al.  Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics , 2015, Cell systems.

[20]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[21]  Alejandro A. Schäffer,et al.  Virus Variation Resource – improved response to emergent viral outbreaks , 2016, Nucleic Acids Res..

[22]  Ben Langmead,et al.  The DNA Data Deluge: Fast, efficient genome sequencing machines are spewing out more data than geneticists can analyze. , 2013, IEEE spectrum.

[23]  Gustavo F. Palacios,et al.  Development and Evaluation of a Panel of Filovirus Sequence Capture Probes for Pathogen Detection by Next-Generation Sequencing , 2014, PloS one.

[24]  Evan S Snitkin,et al.  Tracking a Hospital Outbreak of Carbapenem-Resistant Klebsiella pneumoniae with Whole-Genome Sequencing , 2012, Science Translational Medicine.

[25]  Charles Y. Chiu,et al.  Erratum to: Clinical metagenomic identification of Balamuthia mandrillaris encephalitis and assembly of the draft genome: the continuing case for reference genome sequencing , 2016, Genome Medicine.

[26]  Eric P. Nawrocki,et al.  NCBI prokaryotic genome annotation pipeline , 2016, Nucleic acids research.

[27]  W. Lipkin,et al.  Virome Capture Sequencing Enables Sensitive Viral Diagnosis and Comprehensive Virome Analysis , 2015, mBio.

[28]  Noah Alexander,et al.  Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics. , 2015, Cell systems.

[29]  Jing Zhang,et al.  Comprehensive viral enrichment enables sensitive respiratory virus genomic identification and analysis by next generation sequencing , 2018, Genome research.

[30]  Alejandro A. Schäffer,et al.  Database indexing for production MegaBLAST searches , 2008, Bioinform..

[31]  Joseph L DeRisi,et al.  Actionable diagnosis of neuroleptospirosis by next-generation sequencing. , 2014, The New England journal of medicine.

[32]  Samuel V. Angiuoli,et al.  Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation. , 2008, Omics : a journal of integrative biology.

[33]  Michael DiCuccio,et al.  Using average nucleotide identity to improve taxonomic assignments in prokaryotic genomes at the NCBI , 2018, International journal of systematic and evolutionary microbiology.

[34]  Karen C Carroll,et al.  Understanding the Promises and Hurdles of Metagenomic Next-Generation Sequencing as a Diagnostic Tool for Infectious Diseases , 2017, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[35]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[36]  T. Tatusova,et al.  Solving the Problem: Genome Annotation Standards before the Data Deluge , 2011, Standards in genomic sciences.

[37]  Lillian M. Khan,et al.  Acute West Nile Virus Meningoencephalitis Diagnosed Via Metagenomic Deep Sequencing of Cerebrospinal Fluid in a Renal Transplant Patient , 2016, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[38]  William M. Lee,et al.  Viral Surveillance in Serum Samples From Patients With Acute Liver Failure By Metagenomic Next-Generation Sequencing , 2017, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[39]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[40]  David Cullen,et al.  Making the leap , 2020, C&EN Global Enterprise.

[41]  Zena Lapp,et al.  Integrated genomic and interfacility patient-transfer data reveal the transmission pathways of multidrug-resistant Klebsiella pneumoniae in a regional outbreak , 2017, Science Translational Medicine.

[42]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[43]  J. Landolin,et al.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.

[44]  Maya Gokhale,et al.  Scalable metagenomic taxonomy classification using a reference genome database , 2013, Bioinform..

[45]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[46]  Sujay Chattopadhyay,et al.  Predictive Diagnostics for Escherichia coli Infections Based on the Clonal Association of Antimicrobial Resistance and Clinical Outcome , 2013, Journal of Clinical Microbiology.