Ask2Me VarHarmonizer: A Python-Based Tool to Harmonize Variants from Cancer Genetic Testing Reports and Map them to the ClinVar Database

PURPOSE: The popularity of germline genetic panel testing has led to a vast accumulation of variant-level data. Variant names are not always consistent across laboratories and not easily mappable to public variant databases such as ClinVar. A tool that can automate the process of variants harmonization and mapping is needed to help clinicians ensure their variant interpretations are accurate. METHODS: We present a Python-based tool, Ask2Me VarHarmonizer, that incorporates data cleaning, name harmonization, and a four-attempt mapping to ClinVar procedure. We applied this tool to map variants from a pilot dataset collected from 11 clinical practices. Mapping results were evaluated with and without the transcript information. RESULTS: Using Ask2Me VarHarmonizer, 4728 out of 6027 variant entries (78%) were successfully mapped to ClinVar, corresponding to 3699 mappable unique variants. With the addition of 1099 unique unmappable variants, a total of 4798 unique variants were eventually identified. 427 (9%) of these had multiple names, of which 343 (7%) had multiple names within-practice. 99% mapping consistency was observed with and without transcript information. CONCLUSION: Ask2Me VarHarmonizer aggregates and structures variant data, harmonizes names, and maps variants to ClinVar. Performing harmonization removes the ambiguity and redundancy of variants from different sources.

[1]  Daniel Nilsson,et al.  An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge , 2014, Genome Biology.

[2]  S. Vadaparampil,et al.  Pre‐test genetic counseling services for hereditary breast and ovarian cancer delivered by non‐genetics professionals in the state of Florida , 2015, Clinical genetics.

[3]  Raymond Dalgleish,et al.  hgvs: A Python package for manipulating sequence variants using HGVS nomenclature: 2018 Update , 2018, Human mutation.

[4]  Heidi L Rehm,et al.  Distinguishing Variant Pathogenicity From Genetic Diagnosis: How to Know Whether a Variant Causes a Condition. , 2018, JAMA.

[5]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[6]  Raymond Dalgleish,et al.  HGVS Recommendations for the Description of Sequence Variants: 2016 Update , 2016, Human mutation.

[7]  Johan T den Dunnen,et al.  Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker , 2008, Human mutation.

[8]  Raymond Dalgleish,et al.  VariantValidator: Accurate validation, mapping, and formatting of sequence variation descriptions , 2017, Human mutation.

[9]  Danielle Braun,et al.  A Clinical Decision Support Tool to Predict Cancer Risk for Commonly Tested Cancer-Related Germline Mutations , 2018, Journal of Genetic Counseling.