Genome-wide somatic variant calling using localized colored de Bruijn graphs

Reliable detection of somatic variations is of critical importance in cancer research. Here we present Lancet, an accurate and sensitive somatic variant caller, which detects SNVs and indels by jointly analyzing reads from tumor and matched normal samples using colored de Bruijn graphs. We demonstrate, through extensive experimental comparison on synthetic and real whole-genome sequencing datasets, that Lancet has better accuracy, especially for indel detection, than widely used somatic callers, such as MuTect, MuTect2, LoFreq, Strelka, and Strelka2. Lancet features a reliable variant scoring system, which is essential for variant prioritization, and detects low-frequency mutations without sacrificing the sensitivity to call longer insertions and deletions empowered by the local-assembly engine. In addition to genome-wide analysis, Lancet allows inspection of somatic variants in graph space, which augments the traditional read alignment visualization to help confirm a variant of interest. Lancet is available as an open-source program at https://github.com/nygenome/lancet.Giuseppe Narzisi et al. present Lancet, a genome-wide somatic variant caller using localized colored de Bruijn graphs. Comparisons using real and simulated data show that Lancet has improved accuracy for single nucleotide variants and indels compared to widely used methods MuTect2, LoFreq and Strelka2.

[1]  Michael C. Schatz,et al.  The Challenge of Small-Scale Repeats for Indel Discovery , 2015, Front. Bioeng. Biotechnol..

[2]  A. Wilm,et al.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets , 2012, Nucleic acids research.

[3]  A global reference for human genetic variation , 2015, Nature.

[4]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[5]  Christina Boucher,et al.  Succinct Colored de Bruijn Graphs , 2016, bioRxiv.

[6]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[7]  Xiaoyu Chen,et al.  Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications , 2016, Bioinform..

[8]  Jens Stoye,et al.  Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage , 2016, Algorithms for Molecular Biology.

[9]  Michael C. Schatz,et al.  SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips , 2014, Bioinform..

[10]  Wendy S. W. Wong,et al.  Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs , 2012, Bioinform..

[11]  Konrad Scheffler,et al.  Strelka2: Fast and accurate variant calling for clinical sequencing applications , 2017, bioRxiv.

[12]  O. Hofmann,et al.  VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research , 2016, Nucleic acids research.

[13]  M. Schatz,et al.  Accurate detection of de novo and transmitted indels within exome-capture data using micro-assembly , 2014, Nature Methods.

[14]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[15]  Michael C. Heinold,et al.  A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing , 2015, Nature Communications.

[16]  Vladimir Vacic,et al.  Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions , 2014, Genome Biology.

[17]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[18]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[19]  Joshua M. Stuart,et al.  Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection , 2015, Nature Methods.

[20]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[21]  Aaron R. Quinlan,et al.  BamTools: a C++ API and toolkit for analyzing and managing BAM files , 2011, Bioinform..

[22]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[23]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[24]  G. McVean,et al.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications , 2014, Nature Genetics.

[25]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[26]  Miriam K. Konkel,et al.  Tangram: a comprehensive toolbox for mobile element insertion detection , 2014, BMC Genomics.