RareVar: A Framework for Detecting Low-Frequency Single-Nucleotide Variants

Accurate identification of low-frequency somatic point mutations in tumor samples has important clinical utilities. Although high-throughput sequencing technology enables capturing such variants while sequencing primary tumor samples, our ability for accurate detection is compromised when the variant frequency is close to the sequencer error rate. Most current experimental and bioinformatic strategies target mutations with ≥5% allele frequency, which limits our ability to understand the cancer etiology and tumor evolution. We present an experimental and computational modeling framework, RareVar, to reliably identify low-frequency single-nucleotide variants from high-throughput sequencing data under standard experimental protocols. RareVar protocol includes a benchmark design by pooling DNAs from already sequenced individuals at various concentrations to target variants at desired frequencies, 0.5%-3% in our case. By applying a generalized, linear model-based, position-specific error model, followed by machine-learning-based variant calibration, our approach outperforms existing methods. Our method can be applied on most capture and sequencing platforms without modifying the experimental protocol.

[1]  T. Yen,et al.  Ultra-deep targeted sequencing of advanced oral squamous cell carcinoma identifies a mutation-based prognostic gene signature , 2015, Oncotarget.

[2]  H. Hakonarson,et al.  Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing , 2013, Genome Medicine.

[3]  Joshua F. McMichael,et al.  Clonal evolution in relapsed acute myeloid leukemia revealed by whole genome sequencing , 2011, Nature.

[4]  N. Rosenfeld,et al.  Targeted Deep Sequencing of Plasma DNA Noninvasive Identification and Monitoring of Cancer Mutations by , 2012 .

[5]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[6]  I. B. Van den Veyver,et al.  Exome and genome sequencing in reproductive medicine. , 2018, Fertility and sterility.

[7]  M. Gerstung,et al.  Reliable detection of subclonal single-nucleotide variants in tumour cell populations , 2012, Nature Communications.

[8]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[9]  A. McKenna,et al.  Absolute quantification of somatic DNA alterations in human cancer , 2012, Nature Biotechnology.

[10]  F. Nicolantonio,et al.  Liquid biopsy: monitoring cancer-genetics in the blood , 2013, Nature Reviews Clinical Oncology.

[11]  Frank Diehl,et al.  Detection and quantification of mutations in the plasma of patients with colorectal tumors. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  A. Wilm,et al.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets , 2012, Nucleic acids research.

[13]  C. Hughesman Molecular thermodynamics of the stability of natural, sugar and base-modified DNA duplexes and its application to the design of probes and primers for sensitive detection of somatic point mutations , 2012 .

[14]  N. Rosenfeld,et al.  Noninvasive Identification and Monitoring of Cancer Mutations by Targeted Deep Sequencing of Plasma DNA , 2012, Science Translational Medicine.

[15]  Yunlong Liu,et al.  Statistical modeling for sensitive detection of low-frequency single nucleotide variants , 2016, BMC Genomics.

[16]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[17]  Philip Hugenholtz,et al.  Shining a Light on Dark Sequencing: Characterising Errors in Ion Torrent PGM Data , 2013, PLoS Comput. Biol..

[18]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[19]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[20]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[21]  Ken Chen,et al.  Clonal architecture of secondary acute myeloid leukemia. , 2012, The New England journal of medicine.

[22]  Wendy S. W. Wong,et al.  Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs , 2012, Bioinform..

[23]  Bert Vogelstein,et al.  DETECTION OF CIRCULATING TUMOR DNA IN EARLY AND LATE STAGE HUMAN MALIGNANCIES , 2014 .

[24]  Yoshitaka Narita,et al.  Tumor heterogeneity is an active process maintained by a mutant EGFR-induced cytokine circuit in glioblastoma. , 2010, Genes & development.

[25]  Olivier Harismendy,et al.  Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing , 2011, Genome Biology.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  H. Müller-Lobeck [Colorectal tumors]. , 1980, MMW, Munchener medizinische Wochenschrift.

[28]  A. Børresen-Dale,et al.  The Life History of 21 Breast Cancers , 2012, Cell.