TreeDT: tree pattern mining for gene mapping

We describe TreeDT, a novel association-based gene mapping method. Given a set of disease-associated haplotypes and a set of control haplotypes, TreeDT predicts likely locations of a disease susceptibility gene. TreeDT extracts, essentially in the form of haplotype trees, information about historical recombinations in the population: A haplotype tree constructed at a given chromosomal location is an estimate of the genealogy of the haplotypes. TreeDT constructs these trees for all locations on the given haplotypes and performs a novel disequilibrium test on each tree: Is there a small set of subtrees with relatively high proportions of disease-associated chromosomes, suggesting shared genetic history for those and a likely disease gene location? We give a detailed description of TreeDT and the tree disequilibrium tests, we analyze the algorithm formally, and we evaluate its performance experimentally on both simulated and real data sets. Experimental results demonstrate that TreeDT has high accuracy on difficult mapping tasks and comparisons to other methods (EATDT, HPM, TDT) show that TreeDT is very competitive

[1]  M. McPeek,et al.  Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. , 1999, American journal of human genetics.

[2]  Aravinda Chakravarti,et al.  Exhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies , 2004, Nature Genetics.

[3]  Hannu Toivonen,et al.  TreeDT: gene mapping by tree disequilibrium test , 2001, KDD '01.

[4]  B. Rannala,et al.  High-resolution multipoint linkage-disequilibrium mapping in the context of a human genome sequence. , 2001, American journal of human genetics.

[5]  Heikki Mannila,et al.  Gene mapping by haplotype pattern mining , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[6]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[7]  D J Balding,et al.  Bayesian fine-scale mapping of disease loci, by hidden Markov models. , 2000, American journal of human genetics.

[8]  J. Wrench Table errata: The art of computer programming, Vol. 2: Seminumerical algorithms (Addison-Wesley, Reading, Mass., 1969) by Donald E. Knuth , 1970 .

[9]  A. Chakravarti,et al.  Haplotype and missing data inference in nuclear families. , 2004, Genome research.

[10]  P Sevon,et al.  Association analysis for quantitative traits by data mining: QHPM , 2002, Annals of human genetics.

[11]  C Charles Gu,et al.  Genetic association mapping under founder heterogeneity via weighted haplotype similarity analysis in candidate genes , 2004, Genetic epidemiology.

[12]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[13]  Hannu Toivonen,et al.  Gene Mapping by Pattern Discovery , 2005, Data Mining in Bioinformatics.

[14]  J. Terwilliger A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic marker loci. , 1995, American journal of human genetics.

[15]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[16]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[17]  K. Roeder,et al.  Disequilibrium mapping: composite likelihood for pairwise disequilibrium. , 1996, Genomics.

[18]  Tao Jiang,et al.  Efficient Inference of Haplotypes from Genotypes on a Pedigree , 2003, J. Bioinform. Comput. Biol..

[19]  L Kruglyak,et al.  Parametric and nonparametric linkage analysis: a unified multipoint approach. , 1996, American journal of human genetics.

[20]  K Roeder,et al.  Haplotype fine mapping by evolutionary trees. , 2000, American journal of human genetics.

[21]  Jung-Ying Tzeng,et al.  Evolutionary‐based grouping of haplotypes in association analysis , 2005, Genetic epidemiology.

[22]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[23]  Dajun Qian Haplotype sharing correlation analysis using family data: A comparison with family‐based association test in the presence of allelic heterogeneity , 2004, Genetic epidemiology.

[24]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[25]  Hannu Toivonen,et al.  A Markov Chain Approach to Reconstruction of Long Haplotypes , 2003, Pacific Symposium on Biocomputing.

[26]  N. Freimer,et al.  Linkage-disequilibrium mapping of disease genes by reconstruction of ancestral haplotypes in founder populations. , 1999, American journal of human genetics.

[27]  L. Lazzeroni Linkage disequilibrium and gene mapping: an empirical least-squares approach. , 1998, American journal of human genetics.

[28]  J. Todd,et al.  The British Diabetic Association--Warren repository. , 1990, Autoimmunity.

[29]  D. Qian,et al.  Minimum-recombinant haplotyping in pedigrees. , 2002, American journal of human genetics.

[30]  Yongchao Ge Resampling-based Multiple Testing for Microarray Data Analysis , 2003 .

[31]  J. Kere,et al.  Data mining applied to linkage disequilibrium mapping. , 2000, American journal of human genetics.

[32]  D J Balding,et al.  Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. , 2002, American journal of human genetics.