Gene mapping by haplotype pattern mining

Genetic markers are being increasingly utilized in gene mapping. The discovery of associations between markers and patient phenotypes - such as a disease status - enables the identification of potential disease gene loci. The rationale is that, in diseases with a reasonable genetic contribution, diseased individuals are more likely to have associated marker alleles near the disease susceptibility gene than control individuals. We describe a new gene mapping method-haplotype pattern mining (HPM) - that is based on discovering recurrent marker patterns. We define a class of useful haplotype patterns in genetic case-control data, give an algorithm for finding disease-associated haplotypes, and show how to use them to identify disease susceptibility loci. Experimental studies show that the method has good localization power in data sets with large degrees of phenocopies and with lots of missing and erroneous data. We also demonstrate how the method can be used to discover several genes simultaneously.

[1]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[2]  D. Higgins,et al.  Finding flexible patterns in unaligned protein sequences , 1995, Protein science : a publication of the Protein Society.

[3]  L. Lazzeroni Linkage disequilibrium and gene mapping: an empirical least-squares approach. , 1998, American journal of human genetics.

[4]  R. Doerge,et al.  Empirical threshold values for quantitative trait mapping. , 1994, Genetics.

[5]  K. Weiss,et al.  Linkage disequilibrium mapping of complex disease: fantasy or reality? , 1998, Current opinion in biotechnology.

[6]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  K. Roeder,et al.  Disequilibrium mapping: composite likelihood for pairwise disequilibrium. , 1996, Genomics.

[8]  J. Kere,et al.  Data mining applied to linkage disequilibrium mapping. , 2000, American journal of human genetics.

[9]  M. McPeek,et al.  Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. , 1999, American journal of human genetics.

[10]  L Kruglyak,et al.  Genetic control of serum IgE levels and asthma: linkage and linkage disequilibrium studies in an isolated population. , 1997, Human molecular genetics.

[11]  A. Long,et al.  Two sites in the Delta gene region contribute to naturally occurring variation in bristle number in Drosophila melanogaster. , 1998, Genetics.

[12]  N. Freimer,et al.  Linkage-disequilibrium mapping of disease genes by reconstruction of ancestral haplotypes in founder populations. , 1999, American journal of human genetics.

[13]  J. Terwilliger A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic marker loci. , 1995, American journal of human genetics.

[14]  A. Long,et al.  The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. , 1999, Genome research.