Pure Parsimony Xor Haplotyping

The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper, we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact solutions of the problem by providing polynomial time algorithms for some restricted cases and a fixed-parameter algorithm for the general case. These results are based on some interesting combinatorial properties of a graph representation of the solutions. Furthermore, we show that the problem has a polynomial time k-approximation, where k is the maximum number of xor-genotypes containing a given single nucleotide polymorphisms (SNP). Finally, we propose a heuristic and produce an experimental analysis showing that it scales to real-world large instances taken from the HapMap project.

[1]  Daniel G. Brown,et al.  Integer programming approaches to haplotype inference by pure parsimony , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[3]  Giuseppe Lancia,et al.  A polynomial case of the parsimony haplotyping problem , 2006, Oper. Res. Lett..

[4]  Fanica Gavril,et al.  An algorithm for constructing edge-trees from hypergraphs , 1983, Networks.

[5]  Edward M. Reingold,et al.  Efficient generation of the binary reflected gray code and its applications , 1976, CACM.

[6]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[7]  P. Oefner,et al.  Denaturing high‐performance liquid chromatography: A review , 2001, Human mutation.

[8]  Dan Gusfield,et al.  Haplotype Inference by Pure Parsimony , 2003, CPM.

[9]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[10]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[11]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[12]  J. Beckmann,et al.  Computational Problems in Perfect Phylogeny Haplotyping: Typing without Calling the Allele , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Robert E. Bixby,et al.  An Almost Linear-Time Algorithm for Graph Realization , 1988, Math. Oper. Res..

[14]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[15]  L. van lersel,et al.  Shorelines of Islands of Tractability: Algorithms for Parsimony and Minimum Perfect Phylogeny Haplotyping Problems , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[17]  Roded Sharan,et al.  Islands of Tractability for Parsimony Haplotyping , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Satoru Fujishige,et al.  An Efficient PQ-Graph Algorithm for Solving the Graph-Realization Problem , 1980, J. Comput. Syst. Sci..

[19]  Ron Shamir,et al.  Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs , 2004, CPM.

[20]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[21]  Carla Savage,et al.  A Survey of Combinatorial Gray Codes , 1997, SIAM Rev..

[22]  W. T. Tutte An algorithm for determining whether a given binary matroid is graphic. , 1960 .

[23]  Eiji Oki,et al.  GLPK (GNU Linear Programming Kit) , 2012 .

[24]  Giuseppe Lancia,et al.  Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms , 2004, INFORMS J. Comput..