Detecting Genome-wide Haplotype Polymorphism by Combined Use of Mendelian Constraints and Local Population Structure

Data from current gene-disease association studies motivate changes to existing haplotype inference methodologies. Many datasets are now comprised of both pedigree and population data so it is desirable to incorporate both sources of information when inferring haplotypes. The availability of high-density SNP data also makes it possible to determine and use the precise locations of recombination events. Our proposed method reconstructs haplotype structure on a genome-wide level by jointly using the information from the Mendelian law of inheritance and local population structure. The method combines in one framework new techniques of recombination event detection, maximum likelihood optimization of population haplotype diversity and our previous algorithm of zero-recombinant haplotype reconstruction. Experiments on both real and simulated datasets prove the efficiency and accuracy of our approach in reconstructing the haplotype structure. Our method makes it possible to reveal the haplotypic variation on a genome-wide level.

[1]  Tao Jiang,et al.  A Survey on Haplotyping Algorithms for Tightly Linked Markers , 2008, J. Bioinform. Comput. Biol..

[2]  T. Peters,et al.  Identification of the cystic fibrosis gene. , 1990, BMJ.

[3]  J. Bader The relative power of SNPs and haplotype as genetic markers for association tests. , 2001, Pharmacogenomics.

[4]  N. Kaplan,et al.  On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles , 2002, Genetic epidemiology.

[5]  Jing Xiao,et al.  Fast elimination of redundant linear equations and reconstruction of recombination-free mendelian inheritance on a pedigree , 2007, SODA '07.

[6]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[7]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[8]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[9]  K Lange,et al.  Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. , 1996, American journal of human genetics.

[10]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.

[11]  R. Elston,et al.  A general model for the genetic analysis of pedigree data. , 1971, Human heredity.

[12]  L. Tsui,et al.  Identification of the cystic fibrosis gene: genetic analysis. , 1989, Science.

[13]  G. Abecasis,et al.  Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. , 2005, American journal of human genetics.

[14]  Tao Jiang,et al.  Computing the Minimum Recombinant Haplotype Configuration from Incomplete Genotype Data on a Pedigree by Integer Linear Programming , 2005, J. Comput. Biol..

[15]  S. Leal,et al.  SimPed: A Simulation Program to Generate Haplotype and Genotype Data for Pedigree Structures , 2005, Human Heredity.

[16]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[17]  Xin Li,et al.  An Almost Linear Time Algorithm for a General Haplotype Solution on Tree Pedigrees with no Recombination and its Extensions , 2009, J. Bioinform. Comput. Biol..

[18]  Xin Li Haplotype Inference from Pedigree Data and Population Data , 2010 .

[19]  E. Lander,et al.  Construction of multilocus genetic linkage maps in humans. , 1987, Proceedings of the National Academy of Sciences of the United States of America.