An Integer Programming Approach for the Selection of Tag SNPs Using Multi-allelic LD

Single Nucleotide Polymorphisms (SNPs) are common among human populations. SNPs that are proximally located within a small human chromosome region are generally strongly correlated that a subset of SNPs, termed tag SNPs, can provide enough information to infer neigh- boring SNPs. Such correlations are generally known as linkage disequilibrium (LD) and are measured either pair-wise, such as r2, or multi-to-one (multi-marker). For any given set of SNPs, a variety of algorithms have been proposed to identify a subset of tag SNPs by which the remaining SNPs can be inferred. This paper focuses on finding that number of tag SNPs from which remaining SNPs can be inferred through multi-allelic LD or pair-wise LD with a pre-defined r 2 threshold. We call this the optimal tag SNP selection problem. Although this problem is theoretically NP-hard, it can be formulated as an integer programming (IP) problem under a certain constraint, and the opti- mal solution can be efficiently found by our newly developed IPMarker program. In addition, the flexibility of the computational framework allows us to formulate and solve the problem of finding common tag SNPs for multiple populations that have different LD patterns. Various datasets, in- cluding ENCODE and the Major Histocompatiability Complex (MHC) region, were used to evaluate the performance of IPMarker. We also extended IPMarker to the whole genome HapMap Phase I data. Results showed that IPMarker significantly reduces the number of tag SNPs required when compared to the most widely used program, Haploview, although a significant longer running time is required. Thus, overall, genotyping a selected set of tag SNPs is the most cost-effective way to conduct large-scale genome-wide association studies.

[1]  Kui Zhang,et al.  Hapblock: Haplotype Block Partitioning and Tag Snp Selection Software Using a Set of Dynamic Programming Algorithms , 2022 .

[2]  Zhaohui S. Qin,et al.  Bioinformatics Original Paper an Efficient Comprehensive Search Algorithm for Tagsnp Selection Using Linkage Disequilibrium Criteria , 2022 .

[3]  Zachary A. Szpiech,et al.  Genotype, haplotype and copy-number variation in worldwide human populations , 2008, Nature.

[4]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[5]  N. Risch,et al.  A comparison of linkage disequilibrium measures for fine-scale mapping. , 1995, Genomics.

[6]  D. Goldstein,et al.  Population genomics: Linkage disequilibrium holds the key , 2001, Current Biology.

[7]  A. Brookes The essence of SNPs. , 1999, Gene.

[8]  Yanfang Guo,et al.  Gains in power for exhaustive analyses of haplotypes using variable-sized sliding window strategy: a comparison of association-mapping strategies , 2009, European Journal of Human Genetics.

[9]  Eran Halperin,et al.  Tag SNP selection in genotype data for maximizing SNP prediction accuracy , 2005, ISMB.

[10]  A. Dix This time it's personal. , 2009, The Health service journal.

[11]  Pardis C Sabeti,et al.  A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC , 2006, Nature Genetics.

[12]  Guoliang Chen,et al.  A better block partition and ligation strategy for individual haplotyping , 2008, Bioinform..

[13]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[14]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[15]  Deborah A. Nickerson,et al.  Efficient selection of tagging single-nucleotide polymorphisms in multiple populations , 2006, Human Genetics.

[16]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[17]  Kun-Mao Chao,et al.  A greedier approach for finding tag SNPs , 2006, Bioinform..

[18]  R. Altman,et al.  Finding haplotype tagging SNPs by use of principal components analysis. , 2004, American journal of human genetics.

[19]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[20]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[21]  Michael W. Mahoney,et al.  Intra- and interpopulation genotype reconstruction from tagging SNPs. , 2006, Genome research.

[22]  Lon R Cardon,et al.  Evaluating coverage of genome-wide association studies , 2006, Nature Genetics.

[23]  Kun-Mao Chao,et al.  A new framework for the selection of tag SNPs by multimarker haplotypes , 2008, J. Biomed. Informatics.

[24]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[25]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[26]  Stephan Beck,et al.  A high-resolution linkage-disequilibrium map of the human major histocompatibility complex and first generation of tag single-nucleotide polymorphisms. , 2005, American journal of human genetics.

[27]  Fengzhu Sun,et al.  A model-based approach to selection of tag SNPs , 2006, BMC Bioinformatics.

[28]  Daniel O. Stram,et al.  Modeling and E-M Estimation of Haplotype-Specific Relative Risks from Genotype Data for a Case-Control Study of Unrelated Individuals , 2003, Human Heredity.

[29]  Paola Sebastiani,et al.  Minimal haplotype tagging , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Russell Schwartz,et al.  Optimal Haplotype Block-free Selection of Tagging Snps for Genome-wide Association Studies , 2022 .

[31]  Ting Chen,et al.  Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. , 2004, Genome research.

[32]  K. Hao,et al.  Genome-wide selection of tag SNPs using multiple-marker correlation , 2007, Bioinform..

[33]  Michael Krawczak,et al.  Entropy-based SNP selection for genetic association studies , 2003, Human Genetics.

[34]  Nadezhda Sazonova,et al.  Haplotype Inference and Block Partitioning in Mixed Population Samples , 2008, J. Bioinform. Comput. Biol..

[35]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[36]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.