Exome Sequencing of a Multigenerational Human Pedigree

Over the next few years, the efficient use of next-generation sequencing (NGS) in human genetics research will depend heavily upon the effective mechanisms for the selective enrichment of genomic regions of interest. Recently, comprehensive exome capture arrays have become available for targeting approximately 33 Mb or ∼180,000 coding exons across the human genome. Selective genomic enrichment of the human exome offers an attractive option for new experimental designs aiming to quickly identify potential disease-associated genetic variants, especially in family-based studies. We have evaluated a 2.1 M feature human exome capture array on eight individuals from a three-generation family pedigree. We were able to cover up to 98% of the targeted bases at a long-read sequence read depth of ≥3, 86% at a read depth of ≥10, and over 50% of all targets were covered with ≥20 reads. We identified up to 14,284 SNPs and small indels per individual exome, with up to 1,679 of these representing putative novel polymorphisms. Applying the conservative genotype calling approach HCDiff, the average rate of detection of a variant allele based on Illumina 1 M BeadChips genotypes was 95.2% at ≥10x sequence. Further, we propose an advantageous genotype calling strategy for low covered targets that empirically determines cut-off thresholds at a given coverage depth based on existing genotype data. Application of this method was able to detect >99% of SNPs covered ≥8x. Our results offer guidance for “real-world” applications in human genetics and provide further evidence that microarray-based exome capture is an efficient and reliable method to enrich for chromosomal regions of interest in next-generation sequencing experiments.

[1]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[2]  Michael E Zwick,et al.  Combining Microarray‐based Genomic Selection (MGS) with the Illumina Genome Analyzer Platform to Sequence Diploid Target Regions , 2009, Annals of human genetics.

[3]  Francisco M. De La Vega,et al.  Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. , 2009, Genome research.

[4]  Sangsoo Kim,et al.  The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. , 2009, Genome research.

[5]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[6]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[7]  M. Cuccaro,et al.  Multiple rare SAPAP3 missense variants in trichotillomania and OCD , 2009, Molecular Psychiatry.

[8]  A. Griffiths,et al.  Droplets as Microreactors for High‐Throughput Biology , 2007, Chembiochem : a European journal of chemical biology.

[9]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[10]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[11]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[12]  David T. Okou,et al.  Microarray‐based mutation detection in the dystrophin gene , 2008, Human mutation.

[13]  Andrew Menzies,et al.  A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation , 2009, Nature Genetics.

[14]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[15]  Z. Xuan,et al.  Genome-wide in situ exon capture for selective resequencing , 2007, Nature Genetics.

[16]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.

[17]  Timothy B. Stockwell,et al.  Genetic Variation in an Individual Human Exome , 2008, PLoS genetics.

[18]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.

[19]  M. Nachman,et al.  Estimate of the mutation rate per nucleotide in humans. , 2000, Genetics.

[20]  David T. Okou,et al.  Microarray-based genomic selection for high-throughput resequencing , 2007, Nature Methods.

[21]  Joseph D. Buxbaum,et al.  Multiple rare variants in the etiology of autism spectrum disorders , 2009, Dialogues in clinical neuroscience.

[22]  D. Cooper,et al.  Human Gene Mutation Database , 1996, Human Genetics.

[23]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[24]  Zhenyu Xuan,et al.  Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing , 2009, Nature Protocols.

[25]  Jay Shendure,et al.  Multiplex amplification of large sets of human exons , 2007, Nature Methods.

[26]  J. Seidman,et al.  Filter-based hybridization capture of subgenomes enables resequencing and copy-number detection , 2009, Nature Methods.

[27]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[28]  A. Zaranek,et al.  Multiplex padlock targeted sequencing reveals human hypermutable CpG variations. , 2009, Genome research.

[29]  Jonathan C. Cohen,et al.  Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol , 2004, Science.