Misannotation of multiple-nucleotide variants risks misdiagnosis

Multiple Nucleotide Variants (MNVs) are miscalled by the most widely utilised next generation sequencing analysis (NGS) pipelines, presenting the potential for missing diagnoses. These variants, which should be treated as a single insertion-deletion mutation event, are commonly called as separate single nucleotide variants. This can result in misannotation, incorrect amino acid predictions and potentially false positive and false negative diagnostic results. Using simulated data and re-analysis of sequencing data from a diagnostic targeted gene panel, we demonstrate that the widely adopted pipeline, GATK best practices, results in miscalling of MNVs and that alternative tools can call these variants correctly. The adoption of calling methods that annotate MNVs correctly would present a solution for individual laboratories, however GATK best practices are the basis for important public resources such as the gnomAD database. We suggest integrating a solution into these guidelines would be the optimal approach.

[1]  Eric W. Klee,et al.  Confirming Variants in Next-Generation Sequencing Panel Testing by Sanger Sequencing. , 2015, The Journal of molecular diagnostics : JMD.

[2]  Eric Vilain,et al.  Assessing the necessity of confirmatory testing for exome sequencing results in a clinical molecular diagnostic laboratory , 2014, Genetics in Medicine.

[3]  M. Weedon,et al.  Improved genetic testing for monogenic diabetes using targeted next-generation sequencing , 2013, Diabetologia.

[4]  O. Hofmann,et al.  VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research , 2016, Nucleic acids research.

[5]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[6]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[7]  C. Morrison,et al.  MAC: identifying and correcting annotation for multi-nucleotide variations , 2015, BMC Genomics.

[8]  Chun Hang Au,et al.  INDELseek: detection of complex insertions and deletions from next-generation sequencing data , 2017, BMC Genomics.

[9]  G. McVean,et al.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications , 2014, Nature Genetics.

[10]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[11]  Johan T. den Dunnen,et al.  Describing Sequence Variants Using HGVS Nomenclature. , 2017, Methods in molecular biology.

[12]  Vivien Marx,et al.  The DNA of a nation , 2015, Nature.

[13]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[14]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.