Identification of single nucleotide variants using position-specific error estimation in deep sequencing data

Background Targeted deep sequencing is a highly effective technology to identify known and novel single nucleotide variants (SNVs) with many applications in translational medicine, disease monitoring and cancer profiling. However, identification of SNVs using deep sequencing data is a challenging computational problem as different sequencing artifacts limit the analytical sensitivity of SNV detection, especially at low variant allele frequencies (VAFs). Results To address the problem of relatively high noise levels in amplicon-based deep sequencing data (e.g. with the Ion AmpliSeq technology) in the context of SNV calling, we have developed a new bioinformatics tool called AmpliSolve. AmpliSolve uses a set of normal samples to model position-specific, strand-specific and nucleotide-specific background artifacts (noise), and deploys a Poisson model-based statistical framework for SNV detection. Our tests on both synthetic and real data indicate that AmpliSolve achieves a good trade-off between precision and sensitivity, even at VAF below 5% and as low as 1%. We further validate AmpliSolve by applying it to the detection of SNVs in 96 circulating tumor DNA samples at three clinically relevant genomic positions and compare the results to digital droplet PCR experiments. Conclusions AmpliSolve is a new tool for in-silico estimation of background noise and for detection of low frequency SNVs in targeted deep sequencing data. Although AmpliSolve has been specifically designed for and tested on amplicon-based libraries sequenced with the Ion Torrent platform it can, in principle, be applied to other sequencing platforms as well. AmpliSolve is freely available at https://github.com/dkleftogi/AmpliSolve.

[1]  D. Amadori,et al.  Androgen receptor gene status in plasma DNA associates with worse outcome on enzalutamide or abiraterone for castration-resistant prostate cancer: a multi-institution correlative biomarker study , 2017, Annals of oncology : official journal of the European Society for Medical Oncology.

[2]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[3]  Hanna Lee,et al.  AIRVF: a filtering toolbox for precise variant calling in Ion Torrent sequencing , 2018, Bioinform..

[4]  Delila Gasi Tandefelt,et al.  Plasma AR and abiraterone-resistant prostate cancer , 2015, Science Translational Medicine.

[5]  Alessandro Romanel,et al.  ASEQ: fast allele-specific studies from next-generation sequencing data , 2015, BMC Medical Genomics.

[6]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[7]  N. McGranahan,et al.  Clonal Heterogeneity and Tumor Evolution: Past, Present, and the Future , 2017, Cell.

[8]  Philip Hugenholtz,et al.  Shining a Light on Dark Sequencing: Characterising Errors in Ion Torrent PGM Data , 2013, PLoS Comput. Biol..

[9]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[10]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[11]  Shuifang Zhu,et al.  Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads , 2014, BMC Bioinformatics.

[12]  George D. Cresswell,et al.  The Spatiotemporal Evolution of Lymph Node Spread in Early Breast Cancer , 2018, Clinical Cancer Research.

[13]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[14]  James D. Brenton,et al.  Liquid biopsies come of age: towards implementation of circulating tumour DNA , 2017, Nature Reviews Cancer.

[15]  Faraz Hach,et al.  SiNVICT: ultra-sensitive detection of single nucleotide variants and indels in circulating tumour DNA , 2017, Bioinform..

[16]  Jing Wang,et al.  Strategies for identification of somatic variants using the Ion Torrent deep targeted sequencing platform , 2018, BMC Bioinformatics.

[17]  M. Gerlinger,et al.  Ultra-Sensitive Mutation Detection and Genome-Wide DNA Copy Number Reconstruction by Error-Corrected Circulating Tumor DNA Sequencing. , 2018, Clinical chemistry.

[18]  G. McVean,et al.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications , 2014, Nature Genetics.

[19]  Chang Xu,et al.  A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data , 2018, Computational and structural biotechnology journal.

[20]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[21]  Ellen Heitzer,et al.  The potential of liquid biopsies for the early detection of cancer , 2017, npj Precision Oncology.

[22]  F. Demichelis,et al.  Tumor clone dynamics in lethal prostate cancer , 2014, Science Translational Medicine.

[23]  M. Gerstung,et al.  Reliable detection of subclonal single-nucleotide variants in tumour cell populations , 2012, Nature Communications.

[24]  Ash A. Alizadeh,et al.  Integrated digital error suppression for improved detection of circulating tumor DNA , 2016, Nature Biotechnology.

[25]  Shivakumar Keerthikumar,et al.  Patient-derived Models of Abiraterone- and Enzalutamide-resistant Prostate Cancer Reveal Sensitivity to Ribosome-directed Therapy. , 2018, European urology.