Fast elimination of redundant linear equations and reconstruction of recombination-free mendelian inheritance on a pedigree

Computational inference of haplotypes from genotypes has attracted a great deal of attention in the computational biology community recently, partially driven by the international HapMap project. In this paper, we study the question of how to efficiently infer haplotypes from genotypes of individuals related by a pedigree, assuming that the hereditary process was free of mutations (i.e. the Mendelian law of inheritance) and recombinants. The problem has recently been formulated as a system of linear equations over the finite field of F(2) and solved in O(m3n3) time by using standard Gaussian elimination, where m is the number of loci (or markers) in a genotype and n the number of individuals in the pedigree. We give a much faster algorithm with running time O(mn2 + n3 log2 n log log n). The key ingredients of our construction are (i) a new system of linear equations based on some spanning tree of the pedigree graph and (ii) an efficient method for eliminating redundant equations in a system of O(mn) linear equations over O(n) variables. Although such a fast elimination method is not known for general systems of linear equations, we take advantage of the underlying pedigree graph structure and recent progress on low-stretch spanning trees.

[1]  Dan Gusfield,et al.  An Overview of Combinatorial Methods for Haplotype Inference , 2002, Computational Methods for SNPs and Haplotype Inference.

[2]  Hong Shen,et al.  k-Recombination Haplotype Inference in Pedigrees , 2005, International Conference on Computational Science.

[3]  B. David Saunders,et al.  Certifying inconsistency of sparse linear systems , 1997, SIGS.

[4]  Jianping Dong,et al.  Transmission/disequilibrium test based on haplotype sharing for tightly linked markers. , 2003, American journal of human genetics.

[5]  L. Helmuth Map of the Human Genome 3.0 , 2001, Science.

[6]  K. Roeder,et al.  Transmission/disequilibrium test meets measured haplotype analysis: family-based association analysis guided by evolution of haplotypes. , 2001, American journal of human genetics.

[7]  Tao Jiang,et al.  Minimum Recombinant Haplotype Configuration on Tree Pedigrees ( Extended Abstract ) , 2003 .

[8]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[9]  Daniel F. Gudbjartsson,et al.  Allegro, a new computer program for multipoint linkage analysis , 2000, Nature genetics.

[10]  Tao Jiang,et al.  Efficient rule-based haplotyping algorithms for pedigree data , 2003, RECOMB '03.

[11]  Douglas H. Wiedemann Solving sparse linear equations over finite fields , 1986, IEEE Trans. Inf. Theory.

[12]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[13]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.

[14]  Pradip Tapadar,et al.  Haplotyping in Pedigrees via a Genetic Algorithm , 1999, Human Heredity.

[15]  Tao Jiang,et al.  An exact solution for finding minimum recombinant haplotype configurations on pedigrees with missing data by integer linear programming , 2004, RECOMB.

[16]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[17]  K Lange,et al.  Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. , 1996, American journal of human genetics.

[18]  Shang-Hua Teng,et al.  Lower-stretch spanning trees , 2004, STOC '05.

[19]  D. Qian,et al.  Minimum-recombinant haplotyping in pedigrees. , 2002, American journal of human genetics.

[20]  Andrew M. Odlyzko,et al.  Solving Large Sparse Linear Systems over Finite Fields , 1990, CRYPTO.

[21]  L. Helmuth Genome research: map of the human genome 3.0. , 2001, Science.

[22]  Ming-Yang Kao,et al.  Linear-Time Haplotype Inference on Pedigrees Without Recombinations , 2006, WABI.

[23]  Xi Chen,et al.  Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem , 2005, ISAAC.

[24]  V. Strassen Gaussian elimination is not optimal , 1969 .

[25]  E. Wijsman A deductive method of haplotype analysis in pedigrees. , 1987, American journal of human genetics.

[26]  Terence P. Speed,et al.  An algorithm for haplotype analysis , 1997, RECOMB '97.

[27]  Tao Jiang,et al.  Computing the Minimum Recombinant Haplotype Configuration from Incomplete Genotype Data on a Pedigree by Integer Linear Programming , 2005, J. Comput. Biol..

[28]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[29]  Gene H. Golub,et al.  Matrix computations , 1983 .

[30]  Richard M. Karp,et al.  Large scale reconstruction of haplotypes from genotype data , 2003, RECOMB '03.

[31]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[32]  Shibu Yooseph,et al.  A Survey of Computational Methods for Determining Haplotypes , 2002, Computational Methods for SNPs and Haplotype Inference.

[33]  J. O’Connell Zero‐recombinant haplotyping: Applications to fine mapping using SNPs , 2000, Genetic epidemiology.

[34]  L Kruglyak,et al.  Parametric and nonparametric linkage analysis: a unified multipoint approach. , 1996, American journal of human genetics.