A Rigorous Interlaboratory Examination of the Need to Confirm Next-Generation Sequencing–Detected Variants with an Orthogonal Method in Clinical Genetic Testing

Orthogonal confirmation of next-generation sequencing (NGS)-detected germline variants is standard practice, although published studies have suggested that confirmation of the highest-quality calls may not always be necessary. The key question is how laboratories can establish criteria that consistently identify those NGS calls that require confirmation. Most prior studies addressing this question have had limitations: they have been generally of small scale, omitted statistical justification, and explored limited aspects of underlying data. The rigorous definition of criteria that separate high-accuracy NGS calls from those that may or may not be true remains a crucial issue. We analyzed five reference samples and over 80,000 patient specimens from two laboratories. Quality metrics were examined for approximately 200,000 NGS calls with orthogonal data, including 1662 false positives. A classification algorithm used these data to identify a battery of criteria that flag 100% of false positives as requiring confirmation (CI lower bound, 98.5% to 99.8%, depending on variant type) while minimizing the number of flagged true positives. These criteria identify false positives that the previously published criteria miss. Sampling analysis showed that smaller data sets resulted in less effective criteria. Our methodology for determining test- and laboratory-specific criteria can be generalized into a practical approach that can be used by laboratories to reduce the cost and time burdens of confirmation without affecting clinical accuracy.

[1]  Mauricio O. Carneiro,et al.  Scaling accurate genetic variant discovery to tens of thousands of samples , 2017, bioRxiv.

[2]  S. Turner,et al.  A flexible and efficient template format for circular consensus sequencing and SNP detection , 2010, Nucleic acids research.

[3]  Birgit Funke,et al.  Best practices for benchmarking germline small-variant calls in human genomes , 2019, Nature Biotechnology.

[4]  Sivakumar Gowrisankar,et al.  The landscape of genetic variation in dilated cardiomyopathy as surveyed by clinical DNA sequencing , 2014, Genetics in Medicine.

[5]  Alexis B. Carter,et al.  Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists. , 2018, The Journal of molecular diagnostics : JMD.

[6]  Chunlin Xiao,et al.  Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials , 2018, bioRxiv.

[7]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[8]  J. Potash,et al.  Validation and assessment of variant calling pipelines for next-generation sequencing , 2014, Human Genomics.

[9]  Jefferey Chen,et al.  Sanger Confirmation Is Required to Achieve Optimal Sensitivity and Specificity in Next-Generation Sequencing Panel Testing. , 2016, The Journal of molecular diagnostics : JMD.

[10]  T. de Ravel,et al.  Detecting AGG Interruptions in Females With a FMR1 Premutation by Long-Read Single-Molecule Sequencing: A 1 Year Clinical Experience , 2018, Front. Genet..

[11]  Euan A. Ashley,et al.  Medical implications of technical accuracy in genome sequencing , 2016, Genome Medicine.

[12]  Birgit Funke,et al.  College of American Pathologists' laboratory standards for next-generation sequencing clinical tests. , 2015, Archives of pathology & laboratory medicine.

[13]  Heidi L. Rehm,et al.  Disease-targeted sequencing: a cornerstone in the clinic , 2013, Nature Reviews Genetics.

[14]  Joshua L. Deignan,et al.  ACMG clinical laboratory standards for next-generation sequencing , 2013, Genetics in Medicine.

[15]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[16]  Matthew S. Lebo,et al.  Results of clinical genetic testing of 2,912 probands with hypertrophic cardiomyopathy: expanded panels offer limited additional sensitivity , 2015, Genetics in Medicine.

[17]  Marc L. Salit,et al.  An interlaboratory study of complex variant detection , 2017, bioRxiv.

[18]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[19]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[20]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[21]  Marina N Nikiforova,et al.  Guidelines for Validation of Next-Generation Sequencing-Based Oncology Panels: A Joint Consensus Recommendation of the Association for Molecular Pathology and College of American Pathologists. , 2017, The Journal of molecular diagnostics : JMD.

[22]  Heng Li,et al.  A synthetic-diploid benchmark for accurate variant-calling evaluation , 2017, Nature Methods.

[23]  John G. Cleary,et al.  Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines , 2015, bioRxiv.

[24]  Eric Vilain,et al.  Assessing the necessity of confirmatory testing for exome sequencing results in a clinical molecular diagnostic laboratory , 2014, Genetics in Medicine.

[25]  Heidi L Rehm,et al.  New approaches to molecular diagnosis. , 2013, JAMA.

[26]  Yuya Kobayashi,et al.  A Systematic Comparison of Traditional and Multigene Panel Testing for Hereditary Breast and 77 78 79 80 81 82 Ovarian Cancer Genes in More Than 1000 Patients , 2015 .

[27]  Eric W. Klee,et al.  Confirming Variants in Next-Generation Sequencing Panel Testing by Sanger Sequencing. , 2015, The Journal of molecular diagnostics : JMD.

[28]  Anjali D. Zimmer,et al.  A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing , 2018, BMC Genomics.

[29]  J. Mullikin,et al.  Systematic Evaluation of Sanger Validation of Next-Generation Sequencing Variants. , 2016, Clinical chemistry.

[30]  R. Sebra,et al.  Long‐Read Single Molecule Real‐Time Full Gene Sequencing of Cytochrome P450‐2D6 , 2016, Human mutation.

[31]  Birgit Funke,et al.  Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing , 2016, Genetics in Medicine.

[32]  Justin M. Zook Extensive sequencing of seven human genomes to characterize benchmark reference materials , 2015 .