Comparison of solution-based exome capture methods for next generation sequencing

BackgroundTechniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison.ResultsWe developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays.ConclusionsLibraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons.

[1]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[2]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[3]  Dennis C. Friedrich,et al.  A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries , 2011, Genome Biology.

[4]  G. Weinstock,et al.  Direct selection of human genomic loci by microarray hybridization , 2007, Nature Methods.

[5]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[6]  Christian Gilissen,et al.  De novo mutations of SETBP1 cause Schinzel-Giedion syndrome , 2010, Nature Genetics.

[7]  David Haussler,et al.  The UCSC Known Genes , 2006, Bioinform..

[8]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[9]  A. Bowcock,et al.  Frequent Mutation of BAP1 in Metastasizing Uveal Melanomas , 2010, Science.

[10]  Anne E Carpenter,et al.  Visualization of image data from cells to organisms , 2010, Nature Methods.

[11]  K. Lindblad-Toh,et al.  Performance of Microarray and Liquid Based Capture Methods for Target Enrichment for Massively Parallel Sequencing and SNP Discovery , 2011, PloS one.

[12]  Emily H Turner,et al.  Target-enrichment strategies for next-generation sequencing , 2010, Nature Methods.

[13]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[14]  Jamie K Teer,et al.  Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. , 2010, Genome research.

[15]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[16]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[17]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[18]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[19]  Z. Xuan,et al.  Genome-wide in situ exon capture for selective resequencing , 2007, Nature Genetics.

[20]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[21]  J. Kitzman,et al.  Whole exome capture in solution with 3 Gbp of data , 2010, Genome Biology.

[22]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[23]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[24]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[25]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[26]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[27]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.