Algorithm for haplotype resolution and block partitioning for partial XOR-genotype data

Problems of haplotyping and block partitioning have been extensively studied with regard to the regular genotype data, but more cost-efficient data called XOR-genotypes remain under-investigated. Previous studies developed methods for haplotyping of short-sequence partial XOR-genotypes. In this paper we propose a new algorithm that performs haplotyping of long-range partial XOR-genotype data with possibility of missing entries, and in addition simultaneously finds the block structure for the given data. Our method is implemented as a fast and practical algorithm. We also investigate the effect of the percentage of fully genotyped individuals in a sample on the accuracy of results with and without the missing data. The algorithm is validated by testing on the HapMap data. Obtained results show good prediction rates both for samples with and without missing data. The accuracy of prediction of XOR sites is not significantly affected by the presence of 10% or less missing data.

[1]  J. Beckmann,et al.  Typing without calling the allele: a strategy for inferring SNP haplotypes , 2005, European Journal of Human Genetics.

[2]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[3]  Ron Shamir,et al.  Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs , 2004, CPM.

[4]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[5]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[6]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[7]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[8]  K. Rohde,et al.  Entropy as a Measure for Linkage Disequilibrium over Multilocus Haplotype Blocks , 2003, Human Heredity.

[9]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[10]  Ron Shamir,et al.  Maximum likelihood resolution of multi-block genotypes , 2004, RECOMB.

[11]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[12]  Heikki Mannila,et al.  An MDL Method for Finding Haplotype Blocks and for Estimating the Strength of Haplotype Block Boundaries , 2002, Pacific Symposium on Biocomputing.

[13]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[14]  J. Beckmann,et al.  Computational Problems in Perfect Phylogeny Haplotyping: Typing without Calling the Allele , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[16]  Elaine M. Eschen,et al.  Parsimony-based genetic algorithm for haplotype resolution and block partitioning , 2007 .

[17]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Lusheng Wang,et al.  Haplotype inference by maximum parsimony , 2003, Bioinform..

[19]  P. Tam The International HapMap Consortium. The International HapMap Project (Co-PI of Hong Kong Centre which responsible for 2.5% of genome) , 2003 .

[20]  L. Ohno-Machado Journal of Biomedical Informatics , 2001 .

[21]  R. Karp,et al.  Efficient reconstruction of haplotype structure via perfect phylogeny. , 2002, Journal of bioinformatics and computational biology.

[22]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[23]  P. Oefner,et al.  Denaturing high‐performance liquid chromatography: A review , 2001, Human mutation.

[24]  Dan Geiger,et al.  Model-Based Inference of Haplotype Block Variation , 2004, J. Comput. Biol..

[25]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[26]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[27]  Dan Gusfield,et al.  An Overview of Combinatorial Methods for Haplotype Inference , 2002, Computational Methods for SNPs and Haplotype Inference.