High efficiency error suppression for accurate detection of low-frequency variants

Abstract Detection of cancer-associated somatic mutations has broad applications for oncology and precision medicine. However, this becomes challenging when cancer-derived DNA is in low abundance, such as in impure tissue specimens or in circulating cell-free DNA. Next-generation sequencing (NGS) is particularly prone to technical artefacts that can limit the accuracy for calling low-allele-frequency mutations. State-of-the-art methods to improve detection of low-frequency mutations often employ unique molecular identifiers (UMIs) for error suppression; however, these methods are highly inefficient as they depend on redundant sequencing to assemble consensus sequences. Here, we present a novel strategy to enhance the efficiency of UMI-based error suppression by retaining single reads (singletons) that can participate in consensus assembly. This ‘Singleton Correction’ methodology outperformed other UMI-based strategies in efficiency, leading to greater sensitivity with high specificity in a cell line dilution series. Significant benefits were seen with Singleton Correction at sequencing depths ≤16 000×. We validated the utility and generalizability of this approach in a cohort of >300 individuals whose peripheral blood DNA was subjected to hybrid capture sequencing at ∼5000× depth. Singleton Correction can be incorporated into existing UMI-based error suppression workflows to boost mutation detection accuracy, thus improving the cost-effectiveness and clinical impact of NGS.

[1]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[2]  Ash A. Alizadeh,et al.  Integrated digital error suppression for improved detection of circulating tumor DNA , 2016, Nature Biotechnology.

[3]  Ash A. Alizadeh,et al.  An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage , 2013, Nature Medicine.

[4]  Paolo Vineis,et al.  Prediction of acute myeloid leukaemia risk in healthy individuals , 2018, Nature.

[5]  A. Oza,et al.  Circulating tumour DNA sequence analysis as an alternative to multiple myeloma bone marrow aspirates , 2017, Nature Communications.

[6]  C. Thermes,et al.  Library preparation methods for next-generation sequencing: tone down the bias. , 2014, Experimental cell research.

[7]  Xuemei Lu,et al.  Ultrasensitive and high-efficiency screen of de novo low-frequency mutations by o2n-seq , 2017, Nature Communications.

[8]  S. Bratman,et al.  Cell-free DNA as a post-treatment surveillance strategy: current status. , 2018, Seminars in oncology.

[9]  Daniel B. Sloan,et al.  Detecting Rare Mutations and DNA Damage with Sequencing-Based Methods. , 2018, Trends in biotechnology.

[10]  M. Emond,et al.  Accuracy of Next Generation Sequencing Platforms. , 2014, Next generation, sequencing & applications.

[11]  Jesse J. Salk,et al.  Detection of ultra-rare mutations by next-generation sequencing , 2012, Proceedings of the National Academy of Sciences.

[12]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[13]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[14]  Umer Zeeshan Ijaz,et al.  Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data , 2016, BMC Bioinformatics.

[15]  Ryan D. Morin,et al.  Targeted error-suppressed quantification of circulating tumor DNA using semi-degenerate barcoded adapters and biotinylated baits , 2017, Scientific Reports.

[16]  Trevor J Pugh,et al.  Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation , 2013, Nucleic acids research.

[17]  Lawrence D True,et al.  Sequencing small genomic targets with high efficiency and extreme accuracy , 2015, Nature Methods.

[18]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[19]  Brendan F. Kohrn,et al.  Detecting ultralow-frequency mutations by Duplex Sequencing , 2014, Nature Protocols.

[20]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[21]  K. Kinzler,et al.  Detection and quantification of rare mutations with massively parallel sequencing , 2011, Proceedings of the National Academy of Sciences.

[22]  S. P. Fodor,et al.  Molecular indexing enables quantitative targeted RNA sequencing and reveals poor efficiencies in standard library preparations , 2014, Proceedings of the National Academy of Sciences.

[23]  E. Ahn,et al.  Detection of Low-Frequency Mutations and Identification of Heat-Induced Artifactual Mutations Using Duplex Sequencing , 2019, International journal of molecular sciences.

[24]  Charles Swanton,et al.  Early stage NSCLC — challenges to implementing ctDNA-based screening and MRD detection , 2018, Nature Reviews Clinical Oncology.