Coverage and Characteristics of the Affymetrix GeneChip Human Mapping 100K SNP Set

Improvements in technology have made it possible to conduct genome-wide association mapping at costs within reach of academic investigators, and experiments are currently being conducted with a variety of high-throughput platforms. To provide an appropriate context for interpreting results of such studies, we summarize here results of an investigation of one of the first of these technologies to be publicly available, the Affymetrix GeneChip Human Mapping 100K set of single nucleotide polymorphisms (SNPs). In a systematic analysis of the pattern and distribution of SNPs in the Mapping 100K set, we find that SNPs in this set are undersampled from coding regions (both nonsynonymous and synonymous) and oversampled from regions outside genes, relative to SNPs in the overall HapMap database. In addition, we utilize a novel multilocus linkage disequilibrium (LD) coefficient based on information content (analogous to the information content scores commonly used for linkage mapping) that is equivalent to the familiar measure r 2 in the special case of two loci. Using this approach, we are able to summarize for any subset of markers, such as the Affymetrix Mapping 100K set, the information available for association mapping in that subset, relative to the information available in the full set of markers included in the HapMap, and highlight circumstances in which this multilocus measure of LD provides substantial additional insight about the haplotype structure in a region over pairwise measures of LD.

[1]  Dan L Nicolae,et al.  Quantifying the amount of missing information in genetic association studies , 2006, Genetic epidemiology.

[2]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[3]  Daniel O Stram,et al.  Tag SNP selection for association studies , 2004, Genetic epidemiology.

[4]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[5]  S. P. Fodor,et al.  Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays , 2004, Nature Methods.

[6]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[7]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[8]  P. Donnelly,et al.  The Fine-Scale Structure of Recombination Rate Variation in the Human Genome , 2004, Science.

[9]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[10]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[11]  Eric S. Lander,et al.  An SNP map of the human genome generated by reduced representation shotgun sequencing , 2000, Nature.

[12]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[13]  M. W. Foster,et al.  Integrating ethics and science in the International HapMap Project , 2004, Nature Reviews Genetics.

[14]  Mariza de Andrade,et al.  High-resolution whole-genome association study of Parkinson disease. , 2005, American journal of human genetics.

[15]  T. Matise,et al.  Linkage disequilibrium and inference of ancestral recombination in 538 single-nucleotide polymorphism clusters across the human genome. , 2003, American journal of human genetics.

[16]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.