Efficient Algorithms for Reconstructing Zero-Recombinant Haplotypes on a Pedigree Based on Fast Elimination of Redundant Linear Equations

Computational inference of haplotypes from genotypes has attracted a great deal of attention in the computational biology community recently, partially driven by the international HapMap project. In this paper, we study the question of how to efficiently infer haplotypes from genotypes of individuals related by a pedigree, assuming that the hereditary process was free of mutations (i.e., the Mendelian law of inheritance) and recombinants. The problem has recently been formulated as a system of linear equations over the finite field of $F(2)$ and solved in $O(m^3n^3)$ time by using standard Gaussian elimination, where $m$ is the number of loci (or markers) in a genotype and $n$ the number of individuals in the pedigree. We give a much faster algorithm with running time $O(mn^2+n^3\log^2n\log\log n)$. The key ingredients of our construction are (i) a new system of linear equations based on some spanning tree of the pedigree graph and (ii) an efficient method for eliminating redundant equations in a system of $O(mn)$ linear equations over $O(n)$ variables. Although such a fast elimination method is not known for general systems of linear equations, we take advantage of the underlying pedigree graph structure and recent progress on low-stretch spanning trees.

[1]  L. Helmuth Genome research: map of the human genome 3.0. , 2001, Science.

[2]  Shang-Hua Teng,et al.  Lower-stretch spanning trees , 2004, STOC '05.

[3]  Andrew M. Odlyzko,et al.  Solving Large Sparse Linear Systems over Finite Fields , 1990, CRYPTO.

[4]  Terence P. Speed,et al.  An algorithm for haplotype analysis , 1997, RECOMB '97.

[5]  Tao Jiang,et al.  Computing the Minimum Recombinant Haplotype Configuration from Incomplete Genotype Data on a Pedigree by Integer Linear Programming , 2005, J. Comput. Biol..

[6]  Xianming Chen,et al.  Identification and mapping QTL for high-temperature adult-plant resistance to stripe rust in winter wheat (Triticum aestivum L.) cultivar ‘Stephens’ , 2008, Theoretical and Applied Genetics.

[7]  Gene H. Golub,et al.  Matrix computations , 1983 .

[8]  K Lange,et al.  Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. , 1996, American journal of human genetics.

[9]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[10]  Richard M. Karp,et al.  Large scale reconstruction of haplotypes from genotype data , 2003, RECOMB '03.

[11]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[12]  E. Wijsman A deductive method of haplotype analysis in pedigrees. , 1987, American journal of human genetics.

[13]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[14]  Daniel F. Gudbjartsson,et al.  Allegro, a new computer program for multipoint linkage analysis , 2000, Nature genetics.

[15]  Pradip Tapadar,et al.  Haplotyping in Pedigrees via a Genetic Algorithm , 1999, Human Heredity.

[16]  Tao Jiang,et al.  Efficient rule-based haplotyping algorithms for pedigree data , 2003, RECOMB '03.

[17]  Ming-Yang Kao,et al.  Linear-Time Haplotype Inference on Pedigrees Without Recombinations , 2006, WABI.

[18]  B. David Saunders,et al.  Certifying inconsistency of sparse linear systems , 1997, SIGS.

[19]  Tao Jiang,et al.  An exact solution for finding minimum recombinant haplotype configurations on pedigrees with missing data by integer linear programming , 2004, RECOMB.

[20]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[21]  V. Strassen Gaussian elimination is not optimal , 1969 .

[22]  L Kruglyak,et al.  Parametric and nonparametric linkage analysis: a unified multipoint approach. , 1996, American journal of human genetics.

[23]  R. Wu,et al.  A nonlinear mixed-effect mixture model for functional mapping of dynamic traits , 2008, Heredity.

[24]  Shibu Yooseph,et al.  A Survey of Computational Methods for Determining Haplotypes , 2002, Computational Methods for SNPs and Haplotype Inference.

[25]  Xi Chen,et al.  Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem , 2005, ISAAC.

[26]  Hong Shen,et al.  k-Recombination Haplotype Inference in Pedigrees , 2005, International Conference on Computational Science.

[27]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[28]  S. Tishkoff,et al.  African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. , 2008, Annual review of genomics and human genetics.

[29]  Jianping Dong,et al.  Transmission/disequilibrium test based on haplotype sharing for tightly linked markers. , 2003, American journal of human genetics.

[30]  K. Roeder,et al.  Transmission/disequilibrium test meets measured haplotype analysis: family-based association analysis guided by evolution of haplotypes. , 2001, American journal of human genetics.

[31]  D. Qian,et al.  Minimum-recombinant haplotyping in pedigrees. , 2002, American journal of human genetics.

[32]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.

[33]  Douglas H. Wiedemann Solving sparse linear equations over finite fields , 1986, IEEE Trans. Inf. Theory.

[34]  Tao Jiang,et al.  Minimum Recombinant Haplotype Configuration on Tree Pedigrees ( Extended Abstract ) , 2003 .

[35]  Dan Gusfield,et al.  An Overview of Combinatorial Methods for Haplotype Inference , 2002, Computational Methods for SNPs and Haplotype Inference.

[36]  J. O’Connell Zero‐recombinant haplotyping: Applications to fine mapping using SNPs , 2000, Genetic epidemiology.