TreeDT: gene mapping by tree disequilibrium test

We introduce and evaluate TreeDT, a novel gene mapping method which is based on discovering and assessing tree-like patterns in genetic marker data. Gene mapping aims at discovering a statistical connection from a particular disease or trait to a narrow region in the genome. In a typical case-control setting, data consists of genetic markers typed for a set of disease-associated chromosomes and a set of control chromosomes. A computer scientist would view this data as a set of strings.TreeDT extracts, essentially in the form of substrings and prefix trees, information about the historical recombinations in the population. This information is used to locate fragments potentially inherited from a common diseased founder, and to map the disease gene into the most likely such fragment. The method measures for each chromosomal location the disequilibrium of the prefix tree of marker strings starting from the location, to assess the distribution of disease-associated chromosomes.We evaluate experimentally the performance of TreeDT on realistic, simulated data sets, and comparisons to state of the art methods (TDT, HPM) show that TreeDT is very competitive.

[1]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[2]  J. Terwilliger A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic marker loci. , 1995, American journal of human genetics.

[3]  L Kruglyak,et al.  Parametric and nonparametric linkage analysis: a unified multipoint approach. , 1996, American journal of human genetics.

[4]  K. Roeder,et al.  Disequilibrium mapping: composite likelihood for pairwise disequilibrium. , 1996, Genomics.

[5]  L. Lazzeroni Linkage disequilibrium and gene mapping: an empirical least-squares approach. , 1998, American journal of human genetics.

[6]  M. McPeek,et al.  Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. , 1999, American journal of human genetics.

[7]  N. Freimer,et al.  Linkage-disequilibrium mapping of disease genes by reconstruction of ancestral haplotypes in founder populations. , 1999, American journal of human genetics.

[8]  H. Hishigaki,et al.  Mining the quantitative trait loci associated with oral glucose tolerance in the OLETF rat. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[9]  J. Kere,et al.  Data mining applied to linkage disequilibrium mapping. , 2000, American journal of human genetics.

[10]  Heikki Mannila,et al.  Gene mapping by haplotype pattern mining , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[11]  J. Kere,et al.  Mining Associations Between Genetic Markers, Phenotypes, and Covariates , 2001, Genetic epidemiology.