A linear-time algorithm for reconstructing zero-recombinant haplotype configuration on pedigrees without mating loops

With the launch of the international HapMap project, the haplotype inference problem has attracted a great deal of attention in the computational biology community recently. In this paper, we study the question of how to efficiently infer haplotypes from genotypes of individuals related by a pedigree without mating loops, assuming that the hereditary process was free of mutations (i.e. the Mendelian law of inheritance) and recombinants. We model the haplotype inference problem as a system of linear equations as in (Xiao et al. in Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’07), pp. 655–664, 2007) and present an (optimal) linear-time (i.e.O(mn) time) algorithm to generate a particular solution to the haplotype inference problem, where m is the number of loci (or markers) in a genotype and n is the number of individuals in the pedigree. Moreover, the algorithm also provides a general solution in O(mn2) time, which is optimal because the descriptive size of a general solution could be as large as Θ(mn2). The key ingredients of our construction are (i) a fast consistency checking procedure for the system of linear equations introduced in (Xiao et al. 2007) based on a careful investigation of the relationship between the equations (ii) a novel linear-time method for solving linear equations without invoking the Gaussian elimination method. Although such a fast method for solving equations is not known for general systems of linear equations, we take advantage of the underlying loop-free pedigree graph and some special properties of the linear equations.