Strelka2: Fast and accurate variant calling for clinical sequencing applications

We describe Strelka2 (https://github.com/Illumina/strelka), an open-source small variant calling method for clinical germline and somatic sequencing applications. Strelka2 introduces a novel mixture-model based estimation of indel error parameters from each sample, an efficient tiered haplotype modeling strategy and a normal sample contamination model to improve liquid tumor analysis. For both germline and somatic calling, Strelka2 substantially outperforms current leading tools on both variant calling accuracy and compute cost.

[1]  G. Weinstock,et al.  TIGRA: A targeted iterative graph routing assembler for breakpoint assembly , 2014, Genome research.

[2]  G. McVean,et al.  A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree , 2016, bioRxiv.

[3]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[4]  Cory Y. McLean,et al.  Creating a universal SNP and small indel variant caller with deep neural networks , 2016, bioRxiv.

[5]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[6]  M. Schatz,et al.  Accurate detection of de novo and transmitted indels within exome-capture data using micro-assembly , 2014, Nature Methods.

[7]  Jessica A. Weber,et al.  The Sentieon Genomics Tools – A fast and accurate solution to variant calling from next-generation sequence data , 2017, bioRxiv.

[8]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[9]  G. McVean,et al.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications , 2014, Nature Genetics.

[10]  Chittibabu Guda,et al.  A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference , 2015, BioMed research international.

[11]  Wendy S. W. Wong,et al.  Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs , 2012, Bioinform..

[12]  Russ B. Altman,et al.  A research roadmap for next-generation sequencing informatics , 2016, Science Translational Medicine.

[13]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[14]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[15]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.