Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants

Significance Whole-exome sequencing (WES) is gradually being optimized to identify mutations in increasing proportions of the protein-coding exome, but whole-genome sequencing (WGS) is becoming an attractive alternative. WGS is currently more expensive than WES, but its cost should decrease more rapidly than that of WES. We compared WES and WGS on six unrelated individuals. The distribution of quality parameters for single-nucleotide variants (SNVs) and insertions/deletions (indels) was more uniform for WGS than for WES. The vast majority of SNVs and indels were identified by both techniques, but an estimated 650 high-quality coding SNVs (∼3% of coding variants) were detected by WGS and missed by WES. WGS is therefore slightly more efficient than WES for detecting mutations in the targeted exome. We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For both SNVs and indels, the distributions of coverage depth, genotype quality, and minor read ratio were more uniform for WGS than for WES. After filtering, a mean of 74,398 (95.3%) high-quality (HQ) SNVs and 9,033 (70.6%) HQ indels were called by both platforms. A mean of 105 coding HQ SNVs and 32 indels was identified exclusively by WES whereas 692 HQ SNVs and 105 indels were identified exclusively by WGS. We Sanger-sequenced a random selection of these exclusive variants. For SNVs, the proportion of false-positive variants was higher for WES (78%) than for WGS (17%). The estimated mean number of real coding SNVs (656 variants, ∼3% of all coding HQ SNVs) identified by WGS and missed by WES was greater than the number of SNVs identified by WES and missed by WGS (26 variants). For indels, the proportions of false-positive variants were similar for WES (44%) and WGS (46%). Finally, WES was not reliable for the detection of copy-number variations, almost all of which extended beyond the targeted regions. Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs.

[1]  Andrew Collins,et al.  Exome sequence read depth methods for identifying copy number changes , 2015, Briefings Bioinform..

[2]  Xuan Yuan,et al.  Effectiveness of exome and genome sequencing guided by acuity of illness for diagnosis of neurodevelopmental disorders , 2014, Science Translational Medicine.

[3]  Jason R. Myers,et al.  Comparison of insertion/deletion calling algorithms on human next-generation sequencing data , 2014, BMC Research Notes.

[4]  J. Casanova,et al.  Guidelines for genetic studies in single patients: lessons from primary immunodeficiencies , 2014, The Journal of experimental medicine.

[5]  Anthony M. Zador,et al.  Sources of PCR-induced distortions in high-throughput sequencing data sets , 2014, bioRxiv.

[6]  Martin S. Taylor,et al.  Variant detection sensitivity and biases in whole genome and exome sequencing , 2014, BMC Bioinformatics.

[7]  Xiaolin Zhu,et al.  An Evaluation of Copy Number Variation Detection Tools from Whole‐Exome Sequencing Data , 2014, Human mutation.

[8]  Pieter B. T. Neerincx,et al.  Whole-genome sequence variation, population structure and demographic history of the Dutch population , 2014, Nature Genetics.

[9]  Rebecca C Fitzgerald,et al.  Ordering of mutations in preinvasive disease stages of esophageal carcinogenesis , 2014, Nature Genetics.

[10]  M. Schatz,et al.  Reducing INDEL calling errors in whole genome and exome sequencing data , 2014, Genome Medicine.

[11]  L. Vissers,et al.  Genome sequencing identifies major causes of severe intellectual disability , 2014, Nature.

[12]  Hiroko Matsui,et al.  Effective filtering strategies to improve data quality from population-based whole exome sequencing studies , 2014, BMC Bioinformatics.

[13]  A. Kasarskis,et al.  Analytical validation of whole exome and whole genome sequencing for clinical applications , 2014, BMC Medical Genomics.

[14]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[15]  Daniel R. Zerbino,et al.  Ensembl 2014 , 2013, Nucleic Acids Res..

[16]  David Haussler,et al.  Current status and new features of the Consensus Coding Sequence database , 2013, Nucleic Acids Res..

[17]  Lars Feuk,et al.  The Database of Genomic Variants: a curated collection of structural variation in the human genome , 2013, Nucleic Acids Res..

[18]  T. Fleisher Ribosomal Protein SA Haploinsufficiency in Humans With Isolated Congenital Asplenia , 2013, Pediatrics.

[19]  R. Wilson,et al.  The Next-Generation Sequencing Revolution and Its Impact on Genomics , 2013, Cell.

[20]  Zachary A. Szpiech,et al.  Long runs of homozygosity are enriched for deleterious variation. , 2013, American journal of human genetics.

[21]  Guillaume Vogt,et al.  The human gene connectome as a map of short cuts for morbid allele discovery , 2013, Proceedings of the National Academy of Sciences.

[22]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[23]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[24]  Peter Saffrey,et al.  Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units , 2012, Science Translational Medicine.

[25]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[26]  P. Hoffmann,et al.  Deep intronic APC mutations explain a substantial proportion of patients with familial or early‐onset adenomatous polyposis , 2012, Human mutation.

[27]  Bradley P. Coe,et al.  Copy number variation detection and genotyping from exome sequence data , 2012, Genome research.

[28]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[29]  Zhengyan Kan,et al.  Exome sequencing identifies frequent mutation of ARID1A in molecular subtypes of gastric cancer , 2011, Nature Genetics.

[30]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[31]  Hugo Y. K. Lam,et al.  Performance comparison of exome DNA sequencing technologies , 2011, Nature Biotechnology.

[32]  Heikki Joensuu,et al.  Comparison of solution-based exome capture methods for next generation sequencing , 2011, Genome Biology.

[33]  Hui Jiang,et al.  Comprehensive comparison of three commercial human whole-exome capture platforms , 2011, Genome Biology.

[34]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[35]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.

[36]  F. Rieux-Laucat,et al.  Whole-exome-sequencing-based discovery of human FADD deficiency. , 2010, American journal of human genetics.

[37]  Xin Jin,et al.  TGM6 identified as a novel causative gene of spinocerebellar ataxias using exome sequencing. , 2010, Brain : a journal of neurology.

[38]  J. Casanova,et al.  Whole-exome sequencing-based discovery of STIM1 deficiency in a child with fatal classic Kaposi sarcoma , 2010, The Journal of experimental medicine.

[39]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[40]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[41]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[42]  I. Tikhonova,et al.  Genetic diagnosis by whole exome capture and massively parallel DNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[43]  Eileen M. Shore,et al.  Faculty Opinions recommendation of Targeted capture and massively parallel sequencing of 12 human exomes. , 2009 .

[44]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[45]  Yaqin Ma,et al.  BatchPrimer3: A high throughput web application for PCR and sequencing primer design , 2008, BMC Bioinformatics.

[46]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[47]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[48]  A. Fischer,et al.  Isolated congenital asplenia: a French nationwide retrospective survey of 20 cases. , 2011, The Journal of pediatrics.

[49]  James Xiao,et al.  Supporting Information , 2005 .