Haplotype reconstruction from SNP fragments by minimum error correction

MOTIVATION Haplotype reconstruction based on aligned single nucleotide polymorphism (SNP) fragments is to infer a pair of haplotypes from localized polymorphism data gathered through short genome fragment assembly. An important computational model of this problem is the minimum error correction (MEC) model, which has been mentioned in several literatures. The model retrieves a pair of haplotypes by correcting minimum number of SNPs in given genome fragments coming from an individual's DNA. RESULTS In the first part of this paper, an exact algorithm for the MEC model is presented. Owing to the NP-hardness of the MEC model, we also design a genetic algorithm (GA). The designed GA is intended to solve large size problems and has very good performance. The strength and weakness of the MEC model are shown using experimental results on real data and simulation data. In the second part of this paper, to improve the MEC model for haplotype reconstruction, a new computational model is proposed, which simultaneously employs genotype information of an individual in the process of SNP correction, and is called MEC with genotype information (shortly, MEC/GI). Computational results on extensive datasets show that the new model has much higher accuracy in haplotype reconstruction than the pure MEC model.

[1]  G. Lancia,et al.  Algorithmic Strategies for the SNP Haplotype Assembly Problem , 2002 .

[2]  Russell Schwartz,et al.  Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem , 2002, Briefings Bioinform..

[3]  Eric S. Lander,et al.  An SNP map of the human genome generated by reduced representation shotgun sequencing , 2000, Nature.

[4]  L. Helmuth Genome research: map of the human genome 3.0. , 2001, Science.

[5]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[6]  S. Steele It's raining , 1998 .

[7]  L. Helmuth Map of the Human Genome 3.0 , 2001, Science.

[8]  K K Kidd,et al.  Sequence variability and candidate gene analysis in complex disease: association of mu opioid receptor gene variation with substance dependence. , 2000, Human molecular genetics.

[9]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[10]  M. Rieder,et al.  Sequence variation in the human angiotensin converting enzyme , 1999, Nature Genetics.

[11]  A. Chakravarti It's raining SNPs, hallelujah? , 1998, Nature Genetics.

[12]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[13]  Harvey J. Greenberg,et al.  Opportunities for Combinatorial Optimization in Computational Biology , 2004, INFORMS J. Comput..

[14]  Russell Schwartz,et al.  SNPs Problems, Complexity, and Algorithms , 2001, ESA.

[15]  K. Weiss,et al.  Linkage disequilibrium mapping of complex disease: fantasy or reality? , 1998, Current opinion in biotechnology.

[16]  J. Stephens,et al.  Haplotype Variation and Linkage Disequilibrium in 313 Human Genes , 2001, Science.

[17]  Lusheng Wang,et al.  Haplotype inference by maximum parsimony , 2003, Bioinform..

[18]  Keinosuke Fukunaga,et al.  A Branch and Bound Clustering Algorithm , 1975, IEEE Transactions on Computers.

[19]  Giuseppe Lancia,et al.  Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem , 2002, WABI.

[20]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[21]  Eran Halperin,et al.  Haplotype reconstruction from genotype data using Imperfect Phylogeny , 2004, Bioinform..