A parsimonious tree-grow method for haplotype inference

MOTIVATION Haplotype information has become increasingly important in analyzing fine-scale molecular genetics data, such as disease genes mapping and drug design. Parsimony haplotyping is one of haplotyping problems belonging to NP-hard class. RESULTS In this paper, we aim to develop a novel algorithm for the haplotype inference problem with the parsimony criterion, based on a parsimonious tree-grow method (PTG). PTG is a heuristic algorithm that can find the minimum number of distinct haplotypes based on the criterion of keeping all genotypes resolved during tree-grow process. In addition, a block-partitioning method is also proposed to improve the computational efficiency. We show that the proposed approach is not only effective with a high accuracy, but also very efficient with the computational complexity in the order of O(m2n) time for n single nucleotide polymorphism sites in m individual genotypes. AVAILABILITY The software is available upon request from the authors, or from http://zhangroup.aporc.org/bioinfo/ptg/ CONTACT chen@elec.osaka-sandai.ac.jp SUPPLEMENTARY INFORMATION Supporting materials is available from http://zhangroup.aporc.org/bioinfo/ptg/bti572supplementary.pdf

[1]  M. Rieder,et al.  Sequence variation in the human angiotensin converting enzyme , 1999, Nature Genetics.

[2]  Luonan Chen,et al.  Models and Algorithms for Haplotyping Problem , 2006 .

[3]  Giuseppe Lancia,et al.  Genotyping of Pooled Microsatellite Markers by Combinatorial Optimization Techniques , 1998, Discret. Appl. Math..

[4]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[5]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[6]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[7]  Xiang-Sun Zhang,et al.  Haplotype reconstruction from SNP fragments by minimum error correction , 2005, Bioinform..

[8]  Dmitri V. Zaykin,et al.  Effectiveness of computational methods in haplotype prediction , 2002, Human Genetics.

[9]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[10]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[11]  A. Chakravarti,et al.  Haplotype inference in random population samples. , 2002, American journal of human genetics.

[12]  Dan Gusfield,et al.  Perfect phylogeny haplotyper: haplotype inferral using a tree model , 2003, Bioinform..

[13]  Harvey J. Greenberg,et al.  Opportunities for Combinatorial Optimization in Computational Biology , 2004, INFORMS J. Comput..

[14]  Dan Gusfield,et al.  Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms , 2001, J. Comput. Biol..

[15]  Eran Halperin,et al.  Haplotype reconstruction from genotype data using Imperfect Phylogeny , 2004, Bioinform..

[16]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[17]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[18]  Daniel G. Brown,et al.  A New Integer Programming Formulation for the Pure Parsimony Problem in Haplotype Analysis , 2004, WABI.

[19]  Lusheng Wang,et al.  Haplotype inference by maximum parsimony , 2003, Bioinform..

[20]  Mark Jung,et al.  SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines , 2002, BMC Genetics.

[21]  Russell Schwartz,et al.  SNPs Problems, Complexity, and Algorithms , 2001, ESA.

[22]  D. Qian,et al.  Minimum-recombinant haplotyping in pedigrees. , 2002, American journal of human genetics.

[23]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[24]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[25]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..