Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombination Event

The haplotype inference (HI) problem is the problem of inferring 2n haplotype pairs from n observed genotype vectors. This is a key problem that arises in studying genetic variation in populations, for example in the ongoing HapMap project [5]. In order to have a hope of finding the haplotypes that actually generated the observed genotypes, we must use some (implicit or explicit) genetic model of the evolution of the underlying haplotypes. The Perfect Phylogeny Haplotyping (PPH) model was introduced in 2002 [9] to reflect the “neutral coalescent” or “perfect phylogeny” model of haplotype evolution. The PPH problem (which can be solved in polynomial time) is to determine whether there is an HI solution where the inferred haplotypes can be derived on a perfect phylogeny (tree). Since the introduction of the PPH model, several extensions and modifications of the PPH model have been examined. The most important modification, to model biological reality better, is to allow a limited number of biological events that violate the perfect phylogeny model. This was accomplished implicitly in [7,12] with the inclusion of several heuristics into an algorithm for the PPH problem [8]. Those heuristics are invoked when the genotype data cannot be explained with haplotypes that fit the perfect phylogeny model. In this paper, we address the issue explicitly, by allowing one recombination or homoplasy event in the model of haplotype evolution. We formalize the problems and provide a polynomial time solution for one problem, using an additional, empirically-supported assumption. We present a related framework for the second problem which gives a practical algorithm. We believe the second problem can be solved in polynomial time.

[1]  Dan Gusfield,et al.  Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination , 2004, J. Bioinform. Comput. Biol..

[2]  Ron Shamir,et al.  Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs , 2004, CPM.

[3]  Eran Halperin,et al.  Haplotype reconstruction from genotype data using Imperfect Phylogeny , 2004, Bioinform..

[4]  Dan Gusfield,et al.  A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem , 2005, RECOMB.

[5]  Dan Gusfield,et al.  Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained and structured recombination , 2005, J. Comput. Syst. Sci..

[6]  Shibu Yooseph,et al.  Haplotyping as Perfect Phylogeny: A Direct Approach , 2003, J. Comput. Biol..

[7]  Yun S. Song,et al.  Constructing Minimal Ancestral Recombination Graphs , 2005, J. Comput. Biol..

[8]  Dan Gusfield,et al.  Empirical Exploration of Perfect Phylogeny Haplotyping and Haplotypers , 2003, COCOON.

[9]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[10]  R. Karp,et al.  Efficient reconstruction of haplotype structure via perfect phylogeny. , 2002, Journal of bioinformatics and computational biology.

[11]  Dan Gusfield,et al.  A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem , 2005, RECOMB.

[12]  Rakefet Rosenfeld Calculating the secrets of life , 1995, Nature.

[13]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[14]  Richard M. Karp,et al.  Large scale reconstruction of haplotypes from genotype data , 2003, RECOMB '03.

[15]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[16]  A. Chakravarti,et al.  Haplotype inference in random population samples. , 2002, American journal of human genetics.

[17]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[18]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[19]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[20]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[21]  Yun S. Song On the Combinatorics of Rooted Binary Phylogenetic Trees , 2003 .