Algorithm for Haplotype Inference via Galled-Tree Networks with Simple Galls

The problem of determining haplotypes from genotypes has gained considerable prominence in the research community. Here the focus is on determining sets of SNP values on individual chromosomes since such information captures the genetic causes of diseases. The most efficient algorithmic tool for haplotyping is based on perfect phylogenetic trees. A drawback of this method is that it cannot be applied in situations when the data contains homoplasies (multiple mutations of the same character) or recombinations. Recently, Song et al. ( 2005 ) studied the two cases: haplotyping via imperfect phylogenies with a single homoplasy and via galled-tree networks with one gall. In Gupta et al. ( 2010 ), we have shown that the haplotyping via galled-tree networks is NP-hard, even if we restrict to the case when every gall contains at most 3 mutations. We present a polynomial algorithm for haplotyping via galled-tree networks with simple galls (each having two mutations) for genotype matrices which satisfy a natural condition which is implied by presence of at least one 1 in each column that contains a 2. In the end, we give the experimental results comparing our algorithm with PHASE on simulated data.

[1]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[2]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[3]  Dan Gusfield,et al.  Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination , 2004, J. Bioinform. Comput. Biol..

[4]  Yun S. Song,et al.  Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombination Event , 2005, WABI.

[5]  Shibu Yooseph,et al.  A Survey of Computational Methods for Determining Haplotypes , 2002, Computational Methods for SNPs and Haplotype Inference.

[6]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[7]  J. Hein A heuristic method to reconstruct the history of sequences subject to recombination , 1993, Journal of Molecular Evolution.

[8]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[9]  Kaizhong Zhang,et al.  Perfect Phylogenetic Networks with Recombination , 2001, J. Comput. Biol..

[10]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[11]  A. Chakravarti,et al.  Haplotype inference in random population samples. , 2002, American journal of human genetics.

[12]  Russell Schwartz,et al.  Optimal imperfect phylogeny reconstruction and haplotyping (IPPH). , 2006, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[13]  Dan Gusfield,et al.  An Overview of Combinatorial Methods for Haplotype Inference , 2002, Computational Methods for SNPs and Haplotype Inference.

[14]  David E. Housman,et al.  Digital genotyping and haplotyping with polymerase colonies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Ján Manuch,et al.  Algorithm for Haplotype Inferring Via Galled-Tree Networks with Simple Galls , 2007, ISBRA.

[16]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[17]  Ján Manuch,et al.  Haplotype inferring via galled-tree networks using a hypergraph covering problem for special genotype matrices , 2009, Discret. Appl. Math..

[18]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[19]  Paola Bonizzoni,et al.  The Haplotyping problem: An overview of computational models and solutions , 2003, Journal of Computer Science and Technology.

[20]  Dan Gusfield,et al.  Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained and structured recombination , 2005, J. Comput. Syst. Sci..

[21]  Yun S. Song,et al.  On the minimum number of recombination events in the evolutionary history of DNA sequences , 2004, Journal of mathematical biology.

[22]  Ján Manuch,et al.  Characterization of the Existence of Galled-Tree Networks , 2006, APBC.

[23]  Ján Manuch,et al.  Haplotype Inferring Via Galled-Tree Networks Is NP-Complete , 2008, COCOON.

[24]  Shibu Yooseph,et al.  Haplotyping as Perfect Phylogeny: A Direct Approach , 2003, J. Comput. Biol..