Optimal Haplotype Block-free Selection of Tagging Snps for Genome-wide Association Studies

It is widely hoped that the study of sequence variation in the human genome will provide a means of elucidating the genetic component of complex diseases and variable drug responses. A major stumbling block to the successful design and execution of genome-wide disease association studies using single-nucleotide polymorphisms (SNPs) and linkage disequilibrium is the enormous number of SNPs in the human genome. This results in unacceptably high costs for exhaustive genotyping and presents a challenging problem of statistical inference. Here, we present a new method for optimally selecting minimum informative subsets of SNPs, also known as "tagging" SNPs, that is efficient for genome-wide selection. We contrast this method to published methods including haplotype block tagging, that is, grouping SNPs into segments of low haplotype diversity and typing a subset of the SNPs that can discriminate all common haplotypes within the blocks. Because our method does not rely on a predefined haplotype block structure and makes use of the weaker correlations that occur across neighboring blocks, it can be effectively applied across chromosomal regions with both high and low local linkage disequilibrium. We show that the number of tagging SNPs selected is substantially smaller than previously reported using block-based approaches and that selecting tagging SNPs optimally can result in a two- to threefold savings over selecting random SNPs.

[1]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[2]  Ting Chen,et al.  Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. , 2004, Genome research.

[3]  P. Deloukas,et al.  The impact of SNP density on fine-scale patterns of linkage disequilibrium. , 2004, Human molecular genetics.

[4]  Russell Schwartz,et al.  Haplotypes and informative SNP selection algorithms: don't block out information , 2003, RECOMB '03.

[5]  Z. Meng,et al.  Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. , 2003, American journal of human genetics.

[6]  J. Todd,et al.  The usefulness of different density SNP maps for disease association studies of common variants. , 2003, Human Molecular Genetics.

[7]  John S. Witte,et al.  Haplotype Tagging Single Nucleotide Polymorphisms and Association Studies , 2003, Human Heredity.

[8]  Lon R. Cardon,et al.  Efficient selective screening of haplotype tag SNPs , 2003, Bioinform..

[9]  Paola Sebastiani,et al.  Minimal haplotype tagging , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Gudbjartsson,et al.  A high-resolution recombination map of the human genome , 2002, Nature Genetics.

[11]  Russell Schwartz,et al.  Robustness of Inference of Haplotype Block Structure , 2003, J. Comput. Biol..

[12]  Francisco M De La Vega,et al.  New generation pharmacogenomic tools: a SNP linkage disequilibrium Map, validated SNP assay resource, and high-throughput instrumentation system for large-scale genetic studies. , 2002, BioTechniques.

[13]  Nicholas W Wood,et al.  Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage-disequilibrium gene mapping. , 2003, American journal of human genetics.

[14]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[15]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[16]  Kui Zhang,et al.  Dynamic programming algorithms for haplotype block partitioning: applications to human chromosome 21 haplotype data , 2003, RECOMB '03.

[17]  Ting Chen,et al.  Haplotype block partition with limited resources and applications to human chromosome 21 haplotype data. , 2003, American journal of human genetics.

[18]  Hadar I. Avi-Itzhak,et al.  Selection of Minimum Subsets of Single Nucleotide Polymorphisms to Capture Haplotype Block Diversity , 2003, Pacific Symposium on Biocomputing.

[19]  E. Boerwinkle,et al.  Sequence diversity and large-scale typing of SNPs in the human apolipoprotein E gene. , 2000, Genome research.

[20]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[21]  Daniel O. Stram,et al.  Modeling and E-M Estimation of Haplotype-Specific Relative Risks from Genotype Data for a Case-Control Study of Unrelated Individuals , 2003, Human Heredity.

[22]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  N. Morton,et al.  The optimal measure of allelic association , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[25]  J. Wall,et al.  Assessing the performance of the haplotype block model of linkage disequilibrium. , 2003, American journal of human genetics.

[26]  E. Boerwinkle,et al.  Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. , 1998, American journal of human genetics.

[27]  Nicholas W Wood,et al.  Genome scans and candidate gene approaches in the study of common diseases and variable drug responses. , 2003, Trends in genetics : TIG.

[28]  Michael Krawczak,et al.  Entropy-based SNP selection for genetic association studies , 2003, Human Genetics.

[29]  N. E. Morton,et al.  The first linkage disequilibrium (LD) maps: Delineation of hot and cold blocks by diplotype analysis , 2002, Proceedings of the National Academy of Sciences of the United States of America.