A Workflow to Improve Variant Calling Accuracy in Molecular Barcoded Sequencing Reads

Multiplexed molecular barcoded amplicon sequencing has been previously demonstrated to improve the sensitivity of low-frequency variant detection. Molecular barcoded reads can be used to identify and correct amplification biases introduced during library preparation and sequencing errors. We propose a generic workflow to improve variant calling accuracy that takes advantage of molecular barcoded sequencing reads by applying a base score correction method to duplicate or overlapping read pairs across the targeted panel. The workflow is able to reduce the false-positive rate of variant calls in homopolymer and repetitive regions where the sequencer commonly encounters phasing errors interfering with base calling. The analysis was focused on three specific regions within a custom QIAseq targeted DNA (Qiagen, USA) of 220 genes observed to have a high false-positive rate in a clinical validation study. Uncorrected and corrected datasets were compared at targeted regions to reference calls from NIST and exome data from a reference laboratory (Macrogen, USA). Base correction removed all false positives identified without the correction method and retained the true positive call in the dataset. We have shown that our workflow incorporating base correction of molecular barcoded sequencing data can be applied to germline sequencing to improve variant calling accuracy in genomic regions with certain repetitive sequence motifs.