Evaluation of Hybridization Capture Versus Amplicon‐Based Methods for Whole‐Exome Sequencing

Next‐generation sequencing has aided characterization of genomic variation. While whole‐genome sequencing may capture all possible mutations, whole‐exome sequencing remains cost‐effective and captures most phenotype‐altering mutations. Initial strategies for exome enrichment utilized a hybridization‐based capture approach. Recently, amplicon‐based methods were designed to simplify preparation and utilize smaller DNA inputs. We evaluated two hybridization capture‐based and two amplicon‐based whole‐exome sequencing approaches, utilizing both Illumina and Ion Torrent sequencers, comparing on‐target alignment, uniformity, and variant calling. While the amplicon methods had higher on‐target rates, the hybridization capture‐based approaches demonstrated better uniformity. All methods identified many of the same single‐nucleotide variants, but each amplicon‐based method missed variants detected by the other three methods and reported additional variants discordant with all three other technologies. Many of these potential false positives or negatives appear to result from limited coverage, low variant frequency, vicinity to read starts/ends, or the need for platform‐specific variant calling algorithms. All methods demonstrated effective copy‐number variant calling when evaluated against a single‐nucleotide polymorphism array. This study illustrates some differences between whole‐exome sequencing approaches, highlights the need for selecting appropriate variant calling based on capture method, and will aid laboratories in selecting their preferred approach.

[1]  Peilin Jia,et al.  Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers , 2013, Genome Medicine.

[2]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[3]  Z. Xuan,et al.  Genome-wide in situ exon capture for selective resequencing , 2007, Nature Genetics.

[4]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[5]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.

[6]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[7]  Alex M. Fichtenholtz,et al.  Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing , 2013, Nature Biotechnology.

[8]  G. Weinstock,et al.  Direct selection of human genomic loci by microarray hybridization , 2007, Nature Methods.

[9]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[10]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[11]  Heikki Joensuu,et al.  Comparison of solution-based exome capture methods for next generation sequencing , 2011, Genome Biology.

[12]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[13]  M. Spector,et al.  A comparative analysis of exome capture , 2011, Genome Biology.

[14]  Timothy Daley,et al.  Predicting the molecular complexity of sequencing libraries , 2013, Nature Methods.

[15]  T. Dallman,et al.  Performance comparison of benchtop high-throughput sequencing platforms , 2012, Nature Biotechnology.

[16]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[17]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[18]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[19]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[20]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[21]  James A. Cuff,et al.  Distinguishing protein-coding and noncoding genes in the human genome , 2007, Proceedings of the National Academy of Sciences.

[22]  Richard Simon,et al.  Implementing personalized cancer genomics in clinical trials , 2013, Nature Reviews Drug Discovery.

[23]  Philip Hugenholtz,et al.  Shining a Light on Dark Sequencing: Characterising Errors in Ion Torrent PGM Data , 2013, PLoS Comput. Biol..

[24]  Renato Martins,et al.  Validation and implementation of targeted capture and sequencing for the detection of actionable mutation, copy number variation, and gene rearrangement in clinical cancer specimens. , 2014, The Journal of molecular diagnostics : JMD.

[25]  Hugo Y. K. Lam,et al.  Performance comparison of exome DNA sequencing technologies , 2011, Nature Biotechnology.

[26]  Joel Gelernter,et al.  Variant Callers for Next-Generation Sequencing Data: A Comparison Study , 2013, PloS one.

[27]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[28]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[29]  Lior Pachter,et al.  RESEARCH ARTICLE Open Access Identification and correction of systematic error in high-throughput sequence data , 2022 .

[30]  Eivind Hovig,et al.  Performance comparison of four exome capture systems for deep sequencing , 2014, BMC Genomics.

[31]  Margaret C. Linak,et al.  Sequence-specific error profile of Illumina sequencers , 2011, Nucleic acids research.

[32]  R. Daniel Kortschak,et al.  A comparative analysis of algorithms for somatic SNV detection in cancer , 2013, Bioinform..

[33]  Hui Jiang,et al.  Comprehensive comparison of three commercial human whole-exome capture platforms , 2011, Genome Biology.

[34]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[35]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.