FDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science

FDA proactively invests in tools to support innovation of emerging technologies, such as infectious disease next generation sequencing (ID-NGS). Here, we introduce FDA-ARGOS quality-controlled reference genomes as a public database for diagnostic purposes and demonstrate its utility on the example of two use cases. We provide quality control metrics for the FDA-ARGOS genomic database resource and outline the need for genome quality gap filling in the public domain. In the first use case, we show more accurate microbial identification of Enterococcus avium from metagenomic samples with FDA-ARGOS reference genomes compared to non-curated GenBank genomes. In the second use case, we demonstrate the utility of FDA-ARGOS reference genomes for Ebola virus target sequence comparison as part of a composite validation strategy for ID-NGS diagnostic tests. The use of FDA-ARGOS as an in silico target sequence comparator tool combined with representative clinical testing could reduce the burden for completing ID-NGS clinical trials. To be able to use infectious disease next generation sequencing as a diagnostic tool, appropriate reference datasets are required. Here, Sichtig et al. describe FDA-ARGOS, a reference database for high-quality microbial reference genomes, and demonstrate its utility on the example of two use cases.

[1]  William B. Langdon,et al.  Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks , 2015, BioData Mining.

[2]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[3]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[4]  Alejandro A. Schäffer,et al.  Virus Variation Resource – improved response to emergent viral outbreaks , 2016, Nucleic Acids Res..

[5]  T. Tatusova,et al.  Solving the Problem: Genome Annotation Standards before the Data Deluge , 2011, Standards in genomic sciences.

[6]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[7]  Evan S Snitkin,et al.  Tracking a Hospital Outbreak of Carbapenem-Resistant Klebsiella pneumoniae with Whole-Genome Sequencing , 2012, Science Translational Medicine.

[8]  Zena Lapp,et al.  Integrated genomic and interfacility patient-transfer data reveal the transmission pathways of multidrug-resistant Klebsiella pneumoniae in a regional outbreak , 2017, Science Translational Medicine.

[9]  Lillian M. Khan,et al.  Acute West Nile Virus Meningoencephalitis Diagnosed Via Metagenomic Deep Sequencing of Cerebrospinal Fluid in a Renal Transplant Patient , 2016, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[10]  Stephen J. Salipante,et al.  A Year of Infection in the Intensive Care Unit: Prospective Whole Genome Sequencing of Bacterial Clinical Isolates Reveals Cryptic Transmissions and Novel Microbiota , 2015, PLoS genetics.

[11]  Q. Zeng,et al.  Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop , 2010, Viruses.

[12]  William M. Lee,et al.  Viral Surveillance in Serum Samples From Patients With Acute Liver Failure By Metagenomic Next-Generation Sequencing , 2017, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[13]  J. Landolin,et al.  Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing , 2014 .

[14]  Lisa Kalman,et al.  Assuring the Quality of Next-Generation Sequencing in Clinical Microbiology and Public Health Laboratories , 2016, Journal of Clinical Microbiology.

[15]  Carrie Arnold,et al.  Source code: Putting metagenomics to the test in the clinic , 2017, Nature Medicine.

[16]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[17]  Joseph L DeRisi,et al.  Actionable diagnosis of neuroleptospirosis by next-generation sequencing. , 2014, The New England journal of medicine.

[18]  W. Lipkin,et al.  Virome Capture Sequencing Enables Sensitive Viral Diagnosis and Comprehensive Virome Analysis , 2015, mBio.

[19]  Sujay Chattopadhyay,et al.  Predictive Diagnostics for Escherichia coli Infections Based on the Clonal Association of Antimicrobial Resistance and Clinical Outcome , 2013, Journal of Clinical Microbiology.

[20]  Timothy D. Minogue,et al.  Targeted next-generation sequencing for the detection of ciprofloxacin resistance markers using molecular inversion probes , 2016, Scientific Reports.

[21]  Doug Hyatt,et al.  Quality scores for 32,000 genomes , 2014, Standards in genomic sciences.

[22]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[23]  Maya Gokhale,et al.  Scalable metagenomic taxonomy classification using a reference genome database , 2013, Bioinform..

[24]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[25]  Ben Langmead,et al.  The DNA Data Deluge: Fast, efficient genome sequencing machines are spewing out more data than geneticists can analyze. , 2013, IEEE spectrum.

[26]  Gustavo F. Palacios,et al.  Development and Evaluation of a Panel of Filovirus Sequence Capture Probes for Pathogen Detection by Next-Generation Sequencing , 2014, PloS one.

[27]  Samuel V. Angiuoli,et al.  Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation. , 2008, Omics : a journal of integrative biology.

[28]  Michael DiCuccio,et al.  Using average nucleotide identity to improve taxonomic assignments in prokaryotic genomes at the NCBI , 2018, International journal of systematic and evolutionary microbiology.

[29]  Karen C Carroll,et al.  Understanding the Promises and Hurdles of Metagenomic Next-Generation Sequencing as a Diagnostic Tool for Infectious Diseases , 2017, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[30]  Eric P. Nawrocki,et al.  NCBI prokaryotic genome annotation pipeline , 2016, Nucleic acids research.

[31]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[32]  Noah Alexander,et al.  Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics , 2015, Cell systems.

[33]  Noah Alexander,et al.  Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics. , 2015, Cell systems.

[34]  Jing Zhang,et al.  Comprehensive viral enrichment enables sensitive respiratory virus genomic identification and analysis by next generation sequencing , 2018, Genome research.

[35]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[36]  Shane S. Sturrock,et al.  Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data , 2012, Bioinform..

[37]  Augustine Goba,et al.  Comprehensive panel of real-time TaqMan polymerase chain reaction assays for detection and absolute quantification of filoviruses, arenaviruses, and New World hantaviruses. , 2010, The American journal of tropical medicine and hygiene.

[38]  Alejandro A. Schäffer,et al.  Database indexing for production MegaBLAST searches , 2008, Bioinform..

[39]  Robert Schlaberg,et al.  Validation of Metagenomic Next-Generation Sequencing Tests for Universal Pathogen Detection. , 2017, Archives of pathology & laboratory medicine.

[40]  Vanya Gant,et al.  Diagnosis of Neuroinvasive Astrovirus Infection in an Immunocompromised Adult With Encephalitis by Unbiased Next-Generation Sequencing , 2015, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[41]  G. Weinstock,et al.  Making the Leap from Research Laboratory to Clinic: Challenges and Opportunities for Next-Generation Sequencing in Infectious Disease Diagnostics , 2015, mBio.

[42]  Charles Y. Chiu,et al.  Clinical metagenomic identification of Balamuthia mandrillaris encephalitis and assembly of the draft genome: the continuing case for reference genome sequencing , 2015, Genome Medicine.

[43]  I. Nookaew,et al.  Insights from 20 years of bacterial genome sequencing , 2015, Functional & Integrative Genomics.

[44]  M. C. Schatz,et al.  The DNA data deluge , 2013, IEEE Spectrum.