Self-organizing map approaches for the haplotype assembly problem

Haplotype assembly is to reconstruct a pair of haplotypes from SNP values observed in a set of individual DNA fragments. In this paper, we focus on studying minimum error correction (MEC) model for the haplotype assembly problem and explore self-organizing map (SOM) methods for this problem. Specifically, haplotype assembly by MEC is formulated into an integer linear programming model. Since the MEC problem is NP-hard and thus cannot be solved exactly within acceptable running time for large-scale instances, we investigate the ability of classical SOMs to solve the haplotype assembly problem with MEC model. Then, aiming to overcome the limits of classical SOMs, a novel SOM approach is proposed for the problem. Extensive computational experiments on both synthesized and real datasets show that the new SOM-based algorithm can efficiently reconstruct haplotype pairs in a very high accuracy under realistic parameter settings. Comparison with previous methods also confirms the superior performance of the new SOM approach.

[1]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[2]  Richard M. Karp,et al.  Large scale reconstruction of haplotypes from genotype data , 2003, RECOMB '03.

[3]  Luonan Chen,et al.  Models and Algorithms for Haplotyping Problem , 2006 .

[4]  Xiang-Sun Zhang,et al.  Haplotype reconstruction from SNP fragments by minimum error correction , 2005, Bioinform..

[5]  Luonan Chen,et al.  A parsimonious tree-grow method for haplotype inference , 2005, Bioinform..

[6]  Zhang Xiangsun ALGORITHMS FOR SNP HAPLOTYPE ASSEMBLY PROBLEM , 2004 .

[7]  F. Favata,et al.  A study of the application of Kohonen-type neural networks to the Travelling Salesman Problem , 1991, Biological Cybernetics.

[8]  Sorin Istrail,et al.  Optimal Selection of SNP Markers for Disease Association Studies , 2005, Human Heredity.

[9]  G. Lancia,et al.  Algorithmic Strategies for the SNP Haplotype Assembly Problem , 2002 .

[10]  Matthias Wjst,et al.  BMC Bioinformatics Methodology article Target SNP selection in complex disease association studies , 2004 .

[11]  Harvey J. Greenberg,et al.  Opportunities for Combinatorial Optimization in Computational Biology , 2004, INFORMS J. Comput..

[12]  Xiang-Sun Zhang,et al.  Neural networks in optimization , 2000 .

[13]  Giuseppe Lancia,et al.  Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem , 2002, WABI.

[14]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[15]  Yankui Liu,et al.  Modeling data envelopment analysis by chance method in hybrid uncertain environments , 2010, Math. Comput. Simul..

[16]  Russell Schwartz,et al.  Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem , 2002, Briefings Bioinform..

[17]  K K Kidd,et al.  Sequence variability and candidate gene analysis in complex disease: association of mu opioid receptor gene variation with substance dependence. , 2000, Human molecular genetics.

[18]  L. Helmuth Genome research: map of the human genome 3.0. , 2001, Science.

[19]  L. Helmuth Map of the Human Genome 3.0 , 2001, Science.

[20]  Dan Gusfield,et al.  Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms , 2001, J. Comput. Biol..

[21]  Yanfa Yan,et al.  Alloys: Atomic structure of the quasicrystal Al72Ni20Co8 , 2000, Nature.

[22]  James R. Eshleman,et al.  Conversion of diploidy to haploidy , 2000, Nature.

[23]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[24]  M. Boehnke,et al.  Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies , 2001, Nature Genetics.

[25]  Russell Schwartz,et al.  SNPs Problems, Complexity, and Algorithms , 2001, ESA.

[26]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.