Inconsistency and features of single nucleotide variants detected in whole exome sequencing versus transcriptome sequencing: A case study in lung cancer.

Whole exome sequencing (WES) and RNA sequencing (RNA-Seq) are two main platforms used for next-generation sequencing (NGS). While WES is primarily for DNA variant discovery and RNA-Seq is mainly for measurement of gene expression, both can be used for detection of genetic variants, especially single nucleotide variants (SNVs). How consistently variants can be detected from WES and RNA-Seq has not been systematically evaluated. In this study, we examined the technical and biological inconsistencies in SNV detection using WES and RNA-Seq data from 27 pairs of tumor and matched normal samples. We analyzed SNVs in three categories: WES unique - those only detected in WES, RNA-Seq unique - those only detected in RNA-Seq, and shared - those detected in both. We found a small overlap (average ∼14%) between the SNVs called in WES and RNA-Seq. The WES unique SNVs were mainly due to low coverage, low expression, or their location on the non-transcribed strand in RNA-Seq data, while the RNA-Seq unique SNVs were primarily due to their location out of the WES-capture boundary regions (accounting ∼71%), as well as low coverage of the regions, low coverage of the mutant alleles or RNA-editing. The shared SNVs had high locus-specific coverage in both WES and RNA-Seq and high gene expression levels. Additionally, WES unique and RNA-Seq unique SNVs showed different nucleotide substitution patterns, e.g., ∼55% of RNA-Seq unique variants were A:T→G:C, a hallmark of RNA editing. This study provides an important evaluation on the inconsistencies of somatic SNVs called in WES and RNA-Seq data.

[1]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[2]  Jin Billy Li,et al.  Comment on “Widespread RNA and DNA Sequence Differences in the Human Transcriptome” , 2012, Science.

[3]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[4]  Mustafa Tekin,et al.  The promise of whole-exome sequencing in medical genetics , 2013, Journal of Human Genetics.

[5]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[6]  Nada Jabado,et al.  What can exome sequencing do for you? , 2011, Journal of Medical Genetics.

[7]  Thomas D. Wu,et al.  Genome and transcriptome sequencing of lung cancers reveal diverse mutational and splicing events , 2012, Genome research.

[8]  S. Gabriel,et al.  EGFR Mutations in Lung Cancer: Correlation with Clinical Response to Gefitinib Therapy , 2004, Science.

[9]  Steven J. M. Jones,et al.  Comprehensive molecular profiling of lung adenocarcinoma , 2014, Nature.

[10]  T. Mikkelsen,et al.  Altered adenosine-to-inosine RNA editing in human cancer. , 2007, Genome research.

[11]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[12]  Jacek Majewski,et al.  Comment on “Widespread RNA and DNA Sequence Differences in the Human Transcriptome” , 2012, Science.

[13]  S. Maas Posttranscriptional recoding by RNA editing. , 2012, Advances in protein chemistry and structural biology.

[14]  T. Meitinger,et al.  Identification of recurring tumor-specific somatic mutations in acute myeloid leukemia by transcriptome sequencing , 2011, Leukemia.

[15]  A. Nicholson,et al.  Mutations of the BRAF gene in human cancer , 2002, Nature.

[16]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[17]  David B Goldstein,et al.  Screening the human exome: a comparison of whole genome and whole transcriptome sequencing , 2010, Genome Biology.

[18]  Junfeng Xia,et al.  Next-generation sequencing of paired tyrosine kinase inhibitor-sensitive and -resistant EGFR mutant lung cancer cell lines identifies spectrum of DNA changes associated with drug resistance , 2013, Genome research.

[19]  Peilin Jia,et al.  Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers , 2013, Genome Medicine.

[20]  Joseph K. Pickrell,et al.  Comment on “Widespread RNA and DNA Sequence Differences in the Human Transcriptome” , 2012, Science.

[21]  Christopher R. Cabanski,et al.  Integrated RNA and DNA sequencing improves mutation detection in low purity tumors , 2014, Nucleic acids research.

[22]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[23]  Leilei Chen,et al.  Recoding RNA editing of AZIN1 predisposes to hepatocellular carcinoma , 2013, Nature Medicine.

[24]  A. Hauschild,et al.  Improved survival with vemurafenib in melanoma with BRAF V600E mutation. , 2011, The New England journal of medicine.

[25]  Li Ding,et al.  Genomic Landscape of Non-Small Cell Lung Cancer in Smokers and Never-Smokers , 2012, Cell.

[26]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[27]  Y. Pawitan,et al.  Exome versus transcriptome sequencing in identifying coding region variants , 2012, Expert review of molecular diagnostics.

[28]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[29]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[30]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[31]  Yoo Jin Jung,et al.  The transcriptional landscape and mutational profile of lung adenocarcinoma , 2012, Genome research.

[32]  K. Zhao,et al.  Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq , 2009, Nucleic acids research.