Multiplex padlock targeted sequencing reveals human hypermutable CpG variations.

Utilizing the full power of next-generation sequencing often requires the ability to perform large-scale multiplex enrichment of many specific genomic loci in multiple samples. Several technologies have been recently developed but await substantial improvements. We report the 10,000-fold improvement of a previously developed padlock-based approach, and apply the assay to identifying genetic variations in hypermutable CpG regions across human chromosome 21. From approximately 3 million reads derived from a single Illumina Genome Analyzer lane, approximately 94% (approximately 50,500) target sites can be observed with at least one read. The uniformity of coverage was also greatly improved; up to 93% and 57% of all targets fell within a 100- and 10-fold coverage range, respectively. Alleles at >400,000 target base positions were determined across six subjects and examined for single nucleotide polymorphisms (SNPs), and the concordance with independently obtained genotypes was 98.4%-100%. We detected >500 SNPs not currently in dbSNP, 362 of which were in targeted CpG locations. Transitions in CpG sites were at least 13.7 times more abundant than non-CpG transitions. Fractions of polymorphic CpG sites are lower in CpG-rich regions and show higher correlation with human-chimpanzee divergence within CpG versus non-CpG sites. This is consistent with the hypothesis that methylation rate heterogeneity along chromosomes contributes to mutation rate variation in humans. Our success suggests that targeted CpG resequencing is an efficient way to identify common and rare genetic variations. In addition, the significantly improved padlock capture technology can be readily applied to other projects that require multiplex sample preparation.

[1]  M. Ehrlich,et al.  Heat- and alkali-induced deamination of 5-methylcytosine and cytosine residues in DNA. , 1982, Biochimica et biophysica acta.

[2]  Wen-Hsiung Li,et al.  Mutation rates differ among regions of the mammalian genome , 1989, Nature.

[3]  A. Bird,et al.  The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[4]  D. Cooper,et al.  Human Gene Mutation , 1993 .

[5]  B. Charlesworth,et al.  The effect of deleterious mutations on neutral molecular variation. , 1993, Genetics.

[6]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[7]  Pui-Yan Kwok,et al.  Sequence variations in the public human genome data reflect a bottlenecked population history , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Martin J Lercher,et al.  Regional similarities in polymorphism in the human genome extend over many megabases. , 2002, Trends in genetics : TIG.

[9]  C. Harris,et al.  The IARC TP53 database: New online mutation analysis and recommendations to users , 2002, Human mutation.

[10]  Ronald W. Davis,et al.  Multiplexed genotyping with sequence-tagged molecular inversion probes , 2003, Nature Biotechnology.

[11]  Alexey S Kondrashov,et al.  Direct estimates of human per nucleotide mutation rates at 20 loci causing mendelian diseases , 2003, Human mutation.

[12]  Sudhir Kumar,et al.  Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. , 2003, Genome research.

[13]  Jonathan C. Cohen,et al.  Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol , 2004, Science.

[14]  David N. Cooper,et al.  The CpG dinucleotide and human genetic disease , 1988, Human Genetics.

[15]  P. Green,et al.  Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  M. Webster,et al.  Fixation biases affecting human SNPs. , 2004, Trends in genetics : TIG.

[17]  Fredrik Dahl,et al.  Multiplex amplification enabled by selective circularization of large sets of genomic DNA fragments , 2005, Nucleic acids research.

[18]  Daniel J. Gaffney,et al.  The scale of mutational variation in the murid genome. , 2005, Genome research.

[19]  Jonathan C. Cohen,et al.  Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. , 2006, The New England journal of medicine.

[20]  Aleksey Y Ogurtsov,et al.  Selection in favor of nucleotides G and C diversifies evolution rates and levels of polymorphism at mammalian synonymous sites. , 2006, Journal of theoretical biology.

[21]  Jay Shendure,et al.  Multiplex amplification of large sets of human exons , 2007, Nature Methods.

[22]  Ronald W. Davis,et al.  Connector Inversion Probe Technology: A Powerful One-Primer Multiplex DNA Amplification System for Numerous Scientific Applications , 2007, PLoS ONE.

[23]  David T. Okou,et al.  Microarray-based genomic selection for high-throughput resequencing , 2007, Nature Methods.

[24]  G. Weinstock,et al.  Direct selection of human genomic loci by microarray hybridization , 2007, Nature Methods.

[25]  John Maynard Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[26]  Hanlee P. Ji,et al.  Multigene amplification and massively parallel sequencing for cancer mutation discovery , 2007, Proceedings of the National Academy of Sciences.

[27]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[28]  Z. Xuan,et al.  Genome-wide in situ exon capture for selective resequencing , 2007, Nature Genetics.

[29]  D. Busam,et al.  An Integrated Genomic Analysis of Human Glioblastoma Multiforme , 2008, Science.

[30]  P. Stenson,et al.  Human Gene Mutation Database: towards a comprehensive central mutation database , 2007, Journal of Medical Genetics.

[31]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[32]  K. Garber Fixing the front end , 2008, Nature Biotechnology.

[33]  Hongyu Zhao,et al.  Rare independent mutations in renal salt handling genes contribute to blood pressure variation , 2008, Nature Genetics.

[34]  S. Schmidt,et al.  Hypermutable Non-Synonymous Sites Are under Stronger Negative Selection , 2008, PLoS genetics.

[35]  Malek Faham,et al.  A comprehensive assay for targeted multiplex amplification of human DNA sequences , 2008, Proceedings of the National Academy of Sciences.

[36]  W. Bodmer,et al.  Common and rare variants in multifactorial susceptibility to common diseases , 2008, Nature Genetics.

[37]  G. Parmigiani,et al.  Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses , 2008, Science.

[38]  Madeleine P. Ball,et al.  Targeted and genome-scale methylomics reveals gene body signatures in human cell lines , 2009, Nature Biotechnology.

[39]  Jehyuk Lee,et al.  Digital RNA Allelotyping Reveals Tissue-specific and Allele-specific Gene Expression in Human , 2009, Nature Methods.

[40]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.

[41]  G. Church,et al.  Genome-Wide Identification of Human RNA Editing Sites by Parallel DNA Capturing and Sequencing , 2009, Science.

[42]  J. Stamatoyannopoulos,et al.  Human mutation rate associated with DNA replication timing , 2009, Nature Genetics.

[43]  G. Daley,et al.  Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming , 2009, Nature Biotechnology.