A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem

Since the introduction of the Perfect Phylogeny Haplotyping (PPH) Problem in Recomb 2002 [15], the problem of finding a linear-time (deterministic, worst-case) solution for it has remained open, despite broad interest in the PPH problem and a series of papers on various aspects of it. In this paper we solve the open problem, giving a practical, deterministic linear-time algorithm based on a simple data-structure and simple operations on it. The method is straightforward to program and has been fully implemented. Simulations show that it is much faster in practice than prior methods. The value of a linear-time solution to the PPH problem is partly conceptual and partly for use in the inner-loop of algorithms for more complex problems, where the PPH problem must be solved repeatedly.

[1]  Ron Shamir,et al.  The Incomplete Perfect Phylogeny Haplotype Problem , 2005, J. Bioinform. Comput. Biol..

[2]  Shibu Yooseph,et al.  Combinatorial Problems Arising in SNP and Haplotype Analysis , 2003, DMTCS.

[3]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[4]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[5]  L. Helmuth Map of the Human Genome 3.0 , 2001, Science.

[6]  R. Karp,et al.  Efficient reconstruction of haplotype structure via perfect phylogeny. , 2002, Journal of bioinformatics and computational biology.

[7]  Shibu Yooseph,et al.  A Note on Efficient Computation of Haplotypes via Perfect Phylogeny , 2004, J. Comput. Biol..

[8]  Richard M. Karp,et al.  Perfect phylogeny and haplotype assignment , 2004, RECOMB '04.

[9]  Robert E. Bixby,et al.  An Almost Linear-Time Algorithm for Graph Realization , 1988, Math. Oper. Res..

[10]  Dan Gusfield,et al.  An Overview of Combinatorial Methods for Haplotype Inference , 2002, Computational Methods for SNPs and Haplotype Inference.

[11]  Peter Damaschke Fast Perfect Phylogeny Haplotype Inference , 2003, FCT.

[12]  Jens Gramm,et al.  Perfect Path Phylogeny Haplotyping with Missing Data Is Fixed-Parameter Tractable , 2004, IWPEC.

[13]  Carsten Wiuf,et al.  Inference on Recombination and Block Structure Using Unphased Data , 2004, Genetics.

[14]  Dan Gusfield,et al.  Empirical Exploration of Perfect Phylogeny Haplotyping and Haplotypers , 2003, COCOON.

[15]  Shibu Yooseph,et al.  A Survey of Computational Methods for Determining Haplotypes , 2002, Computational Methods for SNPs and Haplotype Inference.

[16]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[17]  Dan Gusfield,et al.  Perfect phylogeny haplotyper: haplotype inferral using a tree model , 2003, Bioinform..

[18]  Ron Shamir,et al.  Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs , 2004, CPM.

[19]  Eran Halperin,et al.  Haplotype reconstruction from genotype data using Imperfect Phylogeny , 2004, Bioinform..

[20]  Shibu Yooseph,et al.  Haplotyping as Perfect Phylogeny: A Direct Approach , 2003, J. Comput. Biol..

[21]  P. Damaschke Incremental haplotype inference, phylogeny, and almost bipartite graphs , 2004, RECOMB 2004.