LFMD: detecting low-frequency mutations in high-depth genome sequencing data without molecular tags

As next-generation sequencing (NGS) and liquid biopsy become more prevalent in research and in the clinic, there is an increasing need for better methods to reduce cost and improve sensitivity and specificity of low-frequency mutation detection (where the Alternative Allele Frequency, or AAF, is less than 1%). Here we propose a likelihood-based approach, called Low-Frequency Mutation Detector (LFMD), which combines the advantages of duplex sequencing (DS) and the bottleneck sequencing system (BotSeqS) to maximize the utilization of duplicate reads. Compared with the existing state-of-the-art methods, DS, Du Novo, UMI-tools, and Unified Consensus Maker, our method achieves higher sensitivity, higher specificity (< 4 × 10−10 errors per base sequenced) and lower cost (reduced by ~70% at best) without involving additional experimental steps, customized adapters or molecular tags. LFMD is useful in areas where high precision is required, such as drug resistance prediction and cancer screening. As an example of LFMD’s applications, mitochondrial heterogeneity analysis of 28 human brain samples across different stages of Alzheimer’s Disease (AD) showed that the canonical oxidative damage related mutations, C:G>A:T, are significantly increased in the mid-stage group. This is consistent with the Mitochondrial Free Radical Theory of Aging, suggesting that AD may be linked to the aging of brain cells induced by oxidative damage.

[1]  William M. Mauck,et al.  Somatic mitochondrial DNA mutations in cortex and substantia nigra in aging and Parkinson’s disease , 2004, Neurobiology of Aging.

[2]  Cassandra B. Jabara,et al.  Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID , 2011, Proceedings of the National Academy of Sciences.

[3]  H. Braak,et al.  Neuropathological stageing of Alzheimer-related changes , 2004, Acta Neuropathologica.

[4]  L. Marnett,et al.  Oxyradicals and DNA damage. , 2000, Carcinogenesis.

[5]  M A Lovell,et al.  Increased oxidative damage in nuclear and mitochondrial DNA in Alzheimer's disease , 2005, Journal of neurochemistry.

[6]  Zhen Zhao,et al.  High efficiency error suppression for accurate detection of low-frequency variants , 2019, Nucleic acids research.

[7]  Hairong Duan,et al.  Benefits and Challenges with Applying Unique Molecular Identifiers in Next Generation Sequencing to Detect Low Frequency Mutations , 2016, PloS one.

[8]  Sam Angiuoli,et al.  Direct detection of early-stage cancers using circulating tumor DNA , 2017, Science Translational Medicine.

[9]  A. Wilm,et al.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets , 2012, Nucleic acids research.

[10]  G. Nappi,et al.  The role of glutamate in the pathophysiology of Parkinson's disease. , 1996, Functional neurology.

[11]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[12]  M. Lovell,et al.  Oxidatively modified nucleic acids in preclinical Alzheimer's disease (PCAD) brain , 2011, Mechanisms of Ageing and Development.

[13]  James Hicks,et al.  Single cell sequencing approaches for complex biological systems. , 2014, Current opinion in genetics & development.

[14]  Xinjian Wang,et al.  Age-Related Accumulation of Somatic Mitochondrial DNA Mutations in Adult-Derived Human iPSCs. , 2016, Cell stem cell.

[15]  Jesse J. Salk,et al.  Detection of ultra-rare mutations by next-generation sequencing , 2012, Proceedings of the National Academy of Sciences.

[16]  Anton Nekrutenko,et al.  Streamlined analysis of duplex sequencing data with Du Novo , 2016, Genome Biology.

[17]  Ash A. Alizadeh,et al.  Integrated digital error suppression for improved detection of circulating tumor DNA , 2016, Nature Biotechnology.

[18]  L. Loeb,et al.  8-Hydroxyguanine, an abundant form of oxidative DNA damage, causes G----T and A----C substitutions. , 1992, The Journal of biological chemistry.

[19]  YONG CHEN,et al.  A conditional composite likelihood ratio test with boundary constraints , 2018 .

[20]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[21]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[22]  Jiang Li,et al.  MitoSeek: extracting mitochondria information and performing high-throughput mitochondria sequencing analysis , 2013, Bioinform..

[23]  Xuemei Lu,et al.  Ultrasensitive and high-efficiency screen of de novo low-frequency mutations by o2n-seq , 2017, Nature Communications.

[24]  R. Dimond,et al.  Social and ethical issues in mitochondrial donation. , 2015, British medical bulletin.

[25]  Brendan F. Kohrn,et al.  Detecting ultralow-frequency mutations by Duplex Sequencing , 2014, Nature Protocols.

[26]  Marcus Lewis,et al.  Reducing amplification artifacts in high multiplex amplicon sequencing by using molecular barcodes , 2015, BMC Genomics.

[27]  Arthur P. Grollman,et al.  Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing , 2016, Proceedings of the National Academy of Sciences.

[28]  D. Wallace,et al.  Mitochondrial DNA genetics and the heteroplasmy conundrum in evolution and disease. , 2013, Cold Spring Harbor perspectives in biology.

[29]  Scott R. Kennedy,et al.  Mitochondrial DNA mutations increase in early stage Alzheimer disease and are inconsistent with oxidative damage , 2016, Annals of neurology.

[30]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[31]  Jae Seok Lim,et al.  The use of technical replication for detection of low-level somatic mutations in next-generation sequencing , 2019, Nature Communications.

[32]  Nancy P. Kropf,et al.  Benefits and Challenges , 2019, SpringerBriefs in Aging.

[33]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[34]  Mary Miu Yee Waye,et al.  Mitochondrial DNA Mutations , 2019, Encyclopedia of Gerontology and Population Aging.

[35]  M. Evans,et al.  Oxidative DNA damage: mechanisms, mutation, and disease , 2003, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[36]  Ash A. Alizadeh,et al.  An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage , 2013, Nature Medicine.

[37]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[38]  Gonçalo R. Abecasis,et al.  Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of ~2,000 Sardinians Using Tailored Sequencing Analysis Tools , 2015, PLoS genetics.

[39]  Umberto De Marchi,et al.  MitoRS, a method for high throughput, sensitive, and accurate detection of mitochondrial DNA heteroplasmy , 2017, BMC Genomics.

[40]  W. Markesbery,et al.  Increased oxidative damage in nuclear and mitochondrial DNA in mild cognitive impairment , 2006, Journal of neurochemistry.

[41]  A. von Haeseler,et al.  A coalescent approach to the polymerase chain reaction. , 1997, Nucleic acids research.

[42]  M. Drton Likelihood ratio tests and singularities , 2007, math/0703360.

[43]  Jeffrey A. Hussmann,et al.  High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing , 2013, Proceedings of the National Academy of Sciences.

[44]  Rameen Beroukhim,et al.  SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly , 2016, Bioinform..

[45]  Lawrence D True,et al.  Sequencing small genomic targets with high efficiency and extreme accuracy , 2015, Nature Methods.