Choosing SNPs using feature selection

A major challenge for genomewide disease association studies is the high cost of genotyping large number of single nucleotide polymorphisms (SNP). The correlations between SNPs, however, make it possible to select a parsimonious set of informative SNPst known as "tagging" SNPs, able to capture most variation in a population. Considerable research interest has recently focused on the development of methods for finding such SNPs. In this paper, we present an efficient method for finding tagging SNPs. The method does not involve computation-intensive search for SNP subsets but discards redundant SNPs using a feature selection algorithm. In contrast to most existing methods, the method presented here does not limit itself to using only correlations between SNPs in local groups. By using correlations that occur across different chromosomal regions, the method can reduce the number of globally redundant SNPs. Experimental results show that the number of tagging SNPs selected by our method is smaller than by using block-based methods.

[1]  Nicole Soranzo,et al.  A single-nucleotide polymorphism tagging set for human drug metabolism and transport , 2005, Nature Genetics.

[2]  Ting Chen,et al.  Haplotype block partition with limited resources and applications to human chromosome 21 haplotype data. , 2003, American journal of human genetics.

[3]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[4]  Lon R. Cardon,et al.  A first-generation linkage disequilibrium map of human chromosome 22 , 2002, Nature.

[5]  Ting Chen,et al.  Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. , 2004, Genome research.

[6]  Russell Schwartz,et al.  Haplotypes and informative SNP selection algorithms: don't block out information , 2003, RECOMB '03.

[7]  R. Altman,et al.  Finding haplotype tagging SNPs by use of principal components analysis. , 2004, American journal of human genetics.

[8]  Deborah A. Nickerson,et al.  Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans , 2003, Nature Genetics.

[9]  R. Lewontin The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. , 1964, Genetics.

[10]  Hadar I. Avi-Itzhak,et al.  Selection of Minimum Subsets of Single Nucleotide Polymorphisms to Capture Haplotype Block Diversity , 2003, Pacific Symposium on Biocomputing.

[11]  Kui Zhang,et al.  Dynamic programming algorithms for haplotype block partitioning: applications to human chromosome 21 haplotype data , 2003, RECOMB '03.

[12]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[13]  Z. Meng,et al.  Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. , 2003, American journal of human genetics.

[14]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[15]  Lon R. Cardon,et al.  Efficient selective screening of haplotype tag SNPs , 2003, Bioinform..

[16]  Nicholas W Wood,et al.  Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage-disequilibrium gene mapping. , 2003, American journal of human genetics.

[17]  William H. Press,et al.  Numerical recipes in C , 2002 .

[18]  Daniel O. Stram,et al.  Modeling and E-M Estimation of Haplotype-Specific Relative Risks from Genotype Data for a Case-Control Study of Unrelated Individuals , 2003, Human Heredity.

[19]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[20]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[21]  Paola Sebastiani,et al.  Minimal haplotype tagging , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Russell Schwartz,et al.  Optimal Haplotype Block-free Selection of Tagging Snps for Genome-wide Association Studies , 2022 .

[23]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[24]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  K K Kidd,et al.  Comparisons of two methods for haplotype reconstruction and haplotype frequency estimation from population data. , 2001, American journal of human genetics.

[26]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[27]  Subrata K. Das,et al.  Feature Selection with a Linear Dependence Measure , 1971, IEEE Transactions on Computers.

[28]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[30]  N. Risch,et al.  A comparison of linkage disequilibrium measures for fine-scale mapping. , 1995, Genomics.