Intersect-then-combine approach: improving the performance of somatic variant calling in whole exome sequencing data using multiple aligners and callers

Bioinformatic analysis of genomic sequencing data to identify somatic mutations in cancer samples is far from achieving the required robustness and standardisation. In this study we generated a whole exome sequencing benchmark dataset using the platinum genome sample NA12878 and developed an intersect-then-combine (ITC) approach to increase the accuracy in calling single nucleotide variants (SNVs) and indels in tumour-normal pairs. We evaluated the effect of alignment, base quality recalibration, mutation caller and filtering on sensitivity and false positive rate. The ITC approach increased the sensitivity up to 17.1%, without increasing the false positive rate per megabase (FPR/Mb) and its validity was confirmed in a set of clinical samples.

[1]  N. Rosenfeld,et al.  The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes , 2016, Nature Communications.

[2]  Terence P. Speed,et al.  Comparing somatic mutation-callers: beyond Venn diagrams , 2013, BMC Bioinformatics.

[3]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[4]  Salvatore Piscuoglio,et al.  Cerebrospinal fluid-derived circulating tumour DNA better represents the genomic alterations of brain tumours than plasma , 2015, Nature Communications.

[5]  R. Wilson,et al.  Cancer genome sequencing: a review. , 2009, Human molecular genetics.

[6]  H. Hakonarson,et al.  Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing , 2013, Genome Medicine.

[7]  C Caldas,et al.  Consensus on precision medicine for metastatic cancers: a report from the MAP conference. , 2016, Annals of oncology : official journal of the European Society for Medical Oncology.

[8]  John W. Cassidy,et al.  A Biobank of Breast Cancer Explants with Preserved Intra-tumor Heterogeneity to Screen Anticancer Compounds , 2016, Cell.

[9]  Mads Thomassen,et al.  Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data , 2016, PloS one.

[10]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[11]  Ravi Vijaya Satya,et al.  Comparison of somatic mutation calling methods in amplicon and whole exome sequence data , 2014, BMC Genomics.

[12]  J. Zook,et al.  An analytical framework for optimizing variant discovery from personal genomes , 2015, Nature Communications.

[13]  R. Wilson,et al.  The Next-Generation Sequencing Revolution and Its Impact on Genomics , 2013, Cell.

[14]  Joshua M. Stuart,et al.  Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection , 2015, Nature Methods.

[15]  Trevor J Pugh,et al.  Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation , 2013, Nucleic acids research.

[16]  Michael C. Heinold,et al.  A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing , 2015, Nature Communications.

[17]  Lawrence D. True,et al.  Integrative Clinical Genomics of Advanced Prostate Cancer , 2015, Cell.

[18]  Wendy S. W. Wong,et al.  Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs , 2012, Bioinform..

[19]  G. McVean,et al.  A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree , 2016, bioRxiv.

[20]  S Beck,et al.  RPS6KA2, a putative tumour suppressor gene at 6q27 in sporadic epithelial ovarian cancer , 2007, Oncogene.

[21]  Gil McVean,et al.  A reference dataset of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree , 2016 .

[22]  R. Daniel Kortschak,et al.  A comparative analysis of algorithms for somatic SNV detection in cancer , 2013, Bioinform..