Establishing reference samples for detection of somatic mutations and germline variants with NGS technologies

We characterized two reference samples for NGS technologies: a human triple-negative breast cancer cell line and a matched normal cell line. Leveraging several whole-genome sequencing (WGS) platforms, multiple sequencing replicates, and orthogonal mutation detection bioinformatics pipelines, we minimized the potential biases from sequencing technologies, assays, and informatics. Thus, our “truth sets” were defined using evidence from 21 repeats of WGS runs with coverages ranging from 50X to 100X (a total of 140 billion reads). These “truth sets” present many relevant variants/mutations including 193 COSMIC mutations and 9,016 germline variants from the ClinVar database, nonsense mutations in BRCA1/2 and missense mutations in TP53 and FGFR1. Independent validation in three orthogonal experiments demonstrated a successful stress test of the truth set. We expect these reference materials and “truth sets” to facilitate assay development, qualification, validation, and proficiency testing. In addition, our methods can be extended to establish new fully characterized reference samples for the community.

[1]  Chunlin Xiao,et al.  An open resource for accurately benchmarking small variant and reference calls , 2019, Nature Biotechnology.

[2]  Tingting Jiang,et al.  Reliability of Whole-Exome Sequencing for Assessing Intratumor Genetic Heterogeneity , 2018, bioRxiv.

[3]  Thomas Colthurst,et al.  A universal SNP and small-indel variant caller using deep neural networks , 2018, Nature Biotechnology.

[4]  Christopher T. Saunders,et al.  Strelka2: fast and accurate calling of germline and somatic variants , 2018, Nature Methods.

[5]  Isabelle Salmon,et al.  Methods of measurement for tumor mutational burden in tumor tissue. , 2018, Translational lung cancer research.

[6]  Renke Pan,et al.  TNscope: Accurate Detection of Somatic Mutations with Haplotype-based Variant Candidate Detection and Machine Learning Filtering , 2018, bioRxiv.

[7]  Mauricio O. Carneiro,et al.  Scaling accurate genetic variant discovery to tens of thousands of samples , 2017, bioRxiv.

[8]  P. Stephens,et al.  Tumor Mutational Burden as an Independent Predictor of Response to Immunotherapy in Diverse Cancers , 2017, Molecular Cancer Therapeutics.

[9]  Jack Kuipers,et al.  Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers , 2017, BMC Bioinformatics.

[10]  P. A. Futreal,et al.  MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data , 2016, Genome Biology.

[11]  G. McVean,et al.  A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree , 2016, bioRxiv.

[12]  Steven J. M. Jones,et al.  A somatic reference standard for cancer genome sequencing , 2016, Scientific Reports.

[13]  Mads Thomassen,et al.  Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data , 2016, PloS one.

[14]  R. Wilson,et al.  INTEGRATE: gene fusion discovery using whole genome and transcriptome data , 2016, Genome research.

[15]  O. Hofmann,et al.  VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research , 2016, Nucleic acids research.

[16]  Hugo Y. K. Lam,et al.  An ensemble approach to accurately detect somatic mutations using SomaticSeq , 2015, Genome Biology.

[17]  P. Taschner,et al.  Recommendations for Analyzing and Reporting TP53 Gene Variants in the High‐Throughput Sequencing Era , 2014, Human mutation.

[18]  Heng Li,et al.  Toward better understanding of artifacts in variant calling from high-coverage samples , 2014, Bioinform..

[19]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[20]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[21]  A. Vincent-Salomon,et al.  Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas with BRCA1/2 inactivation. , 2012, Cancer research.

[22]  A. Chinnaiyan,et al.  Gene fusions associated with recurrent amplicons represent a class of passenger aberrations in breast cancer. , 2012, Neoplasia.

[23]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[24]  N. Gooderham,et al.  Abstract 5454: Genotoxic consequences of exposure to mixtures of food-derived chemical carcinogens , 2012 .

[25]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[26]  Ken Chen,et al.  SomaticSniper: identification of somatic point mutations in whole genome sequencing data , 2012, Bioinform..

[27]  A. Chinnaiyan,et al.  Functionally Recurrent Rearrangements of the MAST Kinase and Notch Gene Families in Breast Cancer , 2011, Nature Medicine.

[28]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[29]  †The International HapMap Consortium The International HapMap Project , 2003, Nature.

[30]  M. Westerfield,et al.  Characterization of paired tumor and non‐tumor cell lines established from patients with breast cancer , 1998, International journal of cancer.