Allele Frequency Matching Between SNPs Reveals an Excess of Linkage Disequilibrium in Genic Regions of the Human Genome

Significant interest has emerged in mapping genetic susceptibility for complex traits through whole-genome association studies. These studies rely on the extent of association, i.e., linkage disequilibrium (LD), between single nucleotide polymorphisms (SNPs) across the human genome. LD describes the nonrandom association between SNP pairs and can be used as a metric when designing maximally informative panels of SNPs for association studies in human populations. Using data from the 1.58 million SNPs genotyped by Perlegen, we explored the allele frequency dependence of the LD statistic r 2 both empirically and theoretically. We show that average r 2 values between SNPs unmatched for allele frequency are always limited to much less than 1 (theoretical approximately 0.46 to 0.57 for this dataset). Frequency matching of SNP pairs provides a more sensitive measure for assessing the average decay of LD and generates average r 2 values across nearly the entire informative range (from 0 to 0.89 through 0.95). Additionally, we analyzed the extent of perfect LD (r 2 = 1.0) using frequency-matched SNPs and found significant differences in the extent of LD in genic regions versus intergenic regions. The SNP pairs exhibiting perfect LD showed a significant bias for derived, nonancestral alleles, providing evidence for positive natural selection in the human genome.

[1]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[2]  E Lai,et al.  The extent of linkage disequilibrium in four populations with distinct demographic histories. , 2000, American journal of human genetics.

[3]  R. Lewontin,et al.  On measures of gametic disequilibrium. , 1988, Genetics.

[4]  W. G. Hill,et al.  Measures of human population structure show heterogeneity among genomic regions. , 2005, Genome research.

[5]  Carlos D Bustamante,et al.  Ascertainment bias in studies of human genome-wide polymorphism. , 2005, Genome research.

[6]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[7]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[8]  S. Liu-Cordero Patterns of linkage disequilibrium in the human genome , 2002 .

[9]  G. Coop,et al.  THE SIGNATURE OF POSITIVE SELECTION ON STANDING GENETIC VARIATION , 2005, Evolution; international journal of organic evolution.

[10]  F. Wright,et al.  Linkage disequilibrium mapping in isolated populations: the example of Finland revisited. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  H. Stefánsson,et al.  A common inversion under selection in Europeans , 2005, Nature Genetics.

[12]  W. G. Hill,et al.  Genetic Data Analysis II . By Bruce S. Weir, Sunderland, Massachusetts. Sinauer Associates, Inc.445 pages. ISBN 0-87893-902-4. , 1996 .

[13]  Y. Ohnishi,et al.  Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction , 2003, Nature Genetics.

[14]  M. McCarthy,et al.  An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets , 2005, Nature Genetics.

[15]  W. G. Hill,et al.  Estimation of linkage disequilibrium in randomly mating populations , 1974, Heredity.

[16]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[17]  Jeffrey Ross-Ibarra,et al.  Genetic Data Analysis II. Methods for Discrete Population Genentic Data , 2002 .

[18]  L. Cardon,et al.  The complex interplay among factors that influence allelic association , 2004, Nature Reviews Genetics.

[19]  E A Thompson,et al.  A model for the length of tracts of identity by descent in finite random mating populations. , 2003, Theoretical population biology.

[20]  D. Gudbjartsson,et al.  A high-resolution recombination map of the human genome , 2002, Nature Genetics.

[21]  R. Nielsen,et al.  Linkage Disequilibrium as a Signature of Selective Sweeps , 2004, Genetics.

[22]  L R Cardon,et al.  Extent and distribution of linkage disequilibrium in three genomic regions. , 2001, American journal of human genetics.

[23]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[24]  Deborah A Nickerson,et al.  Genomic regions exhibiting positive selection identified from dense genotype data. , 2005, Genome research.

[25]  J. Terwilliger,et al.  An utter refutation of the ‘Fundamental Theorem of the HapMap’ , 2006, European Journal of Human Genetics.

[26]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[27]  J. Longmate,et al.  Complexity and power in case-control association studies. , 2001, American journal of human genetics.

[28]  E. Lander The New Genomics: Global Views of Biology , 1996, Science.

[29]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[30]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[31]  D. Nickerson,et al.  Variation is the spice of life , 2001, Nature Genetics.

[32]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[33]  L. Kruglyak Power tools for human genetics , 2005, Nature Genetics.

[34]  C. Zapata THE D′ MEASURE OF OVERALL GAMETIC DISEQUILIBRIUM BETWEEN PAIRS OF MULTIALLELIC LOCI , 2000 .

[35]  J. M. Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[36]  P. Donnelly,et al.  The Fine-Scale Structure of Recombination Rate Variation in the Human Genome , 2004, Science.

[37]  N. Schork,et al.  Genetic analysis of case/control data using estimated haplotype frequencies: application to APOE locus variation and Alzheimer's disease. , 2001, Genome research.

[38]  P. Hedrick,et al.  Gametic disequilibrium measures: proceed with caution. , 1987, Genetics.

[39]  Aravinda Chakravarti,et al.  Exhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies , 2004, Nature Genetics.

[40]  S. Wright,et al.  Genetical Structure of Populations , 1950, Nature.

[41]  F. Tajima The effect of change in population size on DNA polymorphism. , 1989, Genetics.

[42]  Carlos Bustamante,et al.  Genomic scans for selective sweeps using SNP data. , 2005, Genome research.

[43]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[44]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[45]  A. Chakravarti Population genetics—making sense out of sequence , 1999, Nature Genetics.

[46]  Jakob C. Mueller,et al.  Linkage disequilibrium for different scales and applications , 2004, Briefings Bioinform..

[47]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..