Bioinformatics Original Paper an Efficient Comprehensive Search Algorithm for Tagsnp Selection Using Linkage Disequilibrium Criteria

MOTIVATION Selecting SNP markers for genome-wide association studies is an important and challenging task. The goal is to minimize the number of markers selected for genotyping in a particular platform and therefore reduce genotyping cost while simultaneously maximizing the information content provided by selected markers. RESULTS We devised an improved algorithm for tagSNP selection using the pairwise r(2) criterion. We first break down large marker sets into disjoint pieces, where more exhaustive searches can replace the greedy algorithm for tagSNP selection. These exhaustive searches lead to smaller tagSNP sets being generated. In addition, our method evaluates multiple solutions that are equivalent according to the linkage disequilibrium criteria to accommodate additional constraints. Its performance was assessed using HapMap data. AVAILABILITY A computer program named FESTA has been developed based on this algorithm. The program is freely available and can be downloaded at http://www.sph.umich.edu/csg/qin/FESTA/

[1]  Hadar I. Avi-Itzhak,et al.  Selection of Minimum Subsets of Single Nucleotide Polymorphisms to Capture Haplotype Block Diversity , 2003, Pacific Symposium on Biocomputing.

[2]  W. G. Hill,et al.  Estimation of linkage disequilibrium in randomly mating populations , 1974, Heredity.

[3]  Nicholas W Wood,et al.  Genome scans and candidate gene approaches in the study of common diseases and variable drug responses. , 2003, Trends in genetics : TIG.

[4]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[5]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[6]  Lon R. Cardon,et al.  Efficient selective screening of haplotype tag SNPs , 2003, Bioinform..

[7]  W. G. Hill,et al.  The effects of inbreeding at loci with heterozygote advantage. , 1968, Genetics.

[8]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[9]  Alessandro Rinaldo,et al.  Characterization of multilocus linkage disequilibrium , 2005, Genetic epidemiology.

[10]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[11]  Daniel O Stram,et al.  Software for tag single nucleotide polymorphism selection , 2005, Human Genomics.

[12]  Michael Krawczak,et al.  Entropy-based SNP selection for genetic association studies , 2003, Human Genetics.

[13]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[14]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[15]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Kun Zhang,et al.  HaploBlockFinder: Haplotype Block Analyses , 2003, Bioinform..

[17]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[18]  A. Jeffreys,et al.  Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex , 2001, Nature Genetics.

[19]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[20]  P. Tam The International HapMap Consortium. The International HapMap Project (Co-PI of Hong Kong Centre which responsible for 2.5% of genome) , 2003 .

[21]  Z. Meng,et al.  Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. , 2003, American journal of human genetics.

[22]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[23]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[24]  Christopher A. Haiman,et al.  Choosing Haplotype-Tagging SNPS Based on Unphased Genotype Data Using a Preliminary Sample of Unrelated Subjects with an Example from the Multiethnic Cohort Study , 2003, Human Heredity.

[25]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[26]  G. Abecasis,et al.  Using haplotype blocks to map human complex trait loci. , 2003, Trends in genetics : TIG.

[27]  Lon R. Cardon,et al.  A first-generation linkage disequilibrium map of human chromosome 22 , 2002, Nature.

[28]  Paola Sebastiani,et al.  Minimal haplotype tagging , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Russell Schwartz,et al.  Optimal Haplotype Block-free Selection of Tagging Snps for Genome-wide Association Studies , 2022 .

[30]  Francis S. Collins,et al.  Variations on a Theme: Cataloging Human DNA Sequence Variation , 1997, Science.

[31]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[32]  R. Altman,et al.  Finding haplotype tagging SNPs by use of principal components analysis. , 2004, American journal of human genetics.

[33]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[34]  N. Risch,et al.  A comparison of linkage disequilibrium measures for fine-scale mapping. , 1995, Genomics.