An Almost Linear Time Algorithm for a General Haplotype Solution on Tree Pedigrees with no Recombination and its Extensions

We study the haplotype inference problem from pedigree data under the zero recombination assumption, which is well supported by real data for tightly linked markers (i.e. single nucleotide polymorphisms (SNPs)) over a relatively large chromosome segment. We solve the problem in a rigorous mathematical manner by formulating genotype constraints as a linear system of inheritance variables. We then utilize disjoint-set structures to encode connectivity information among individuals, to detect constraints from genotypes, and to check consistency of constraints. On a tree pedigree without missing data, our algorithm can output a general solution as well as the number of total specific solutions in a nearly linear time O(mn . alpha(n)), where m is the number of loci, n is the number of individuals and alpha is the inverse Ackermann function, which is a further improvement over existing ones. We also extend the idea to looped pedigrees and pedigrees with missing data by considering existing (partial) constraints on inheritance variables. The algorithm has been implemented in C++ and will be incorporated into our PedPhase package. Experimental results show that it can correctly identify all 0-recombinant solutions with great efficiency. Comparisons with other two popular algorithms show that the proposed algorithm achieves 10 to 10(5)-fold improvements over a variety of parameter settings. The experimental study also provides empirical evidences on the complexity bounds suggested by theoretical analysis.

[1]  Tao Jiang,et al.  Linear-Time Reconstruction of Zero-Recombinant Mendelian Inheritance on Pedigrees without Mating Loops , 2007 .

[2]  Tao Jiang,et al.  Efficient Inference of Haplotypes from Genotypes on a Pedigree , 2003, J. Bioinform. Comput. Biol..

[3]  Ming-Yang Kao,et al.  Linear-Time Haplotype Inference on Pedigrees Without Recombinations , 2006, WABI.

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[5]  S. Leal,et al.  SimPed: A Simulation Program to Generate Haplotype and Genotype Data for Pedigree Structures , 2005, Human Heredity.

[6]  Xi Chen,et al.  Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem , 2005, ISAAC.

[7]  Dan Gusfield,et al.  An Overview of Combinatorial Methods for Haplotype Inference , 2002, Computational Methods for SNPs and Haplotype Inference.

[8]  G. Abecasis,et al.  Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. , 2005, American journal of human genetics.

[9]  Tao Jiang,et al.  A Survey on Haplotyping Algorithms for Tightly Linked Markers , 2008, J. Bioinform. Comput. Biol..

[10]  Jing Xiao,et al.  Fast elimination of redundant linear equations and reconstruction of recombination-free mendelian inheritance on a pedigree , 2007, SODA '07.

[11]  Shibu Yooseph,et al.  A Survey of Computational Methods for Determining Haplotypes , 2002, Computational Methods for SNPs and Haplotype Inference.

[12]  Hongyu Zhao,et al.  A comparison of several methods for haplotype frequency estimation and haplotype reconstruction for tightly linked markers from general pedigrees , 2006, Genetic epidemiology.

[13]  Luonan Chen,et al.  Models and Algorithms for Haplotyping Problem , 2006 .

[14]  Tao Jiang,et al.  Computing the Minimum Recombinant Haplotype Configuration from Incomplete Genotype Data on a Pedigree by Integer Linear Programming , 2005, J. Comput. Biol..

[15]  Jan van Leeuwen,et al.  Worst-case Analysis of Set Union Algorithms , 1984, JACM.

[16]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[17]  Paola Bonizzoni,et al.  The Haplotyping problem: An overview of computational models and solutions , 2003, Journal of Computer Science and Technology.

[18]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.