Using genetic algorithm in reconstructing single individual haplotype with minimum error correction

Discovering ways to reconstruct reliable Single Individual Haplotypes (SIHs) becomes one of the core issues in the whole-genome research nowadays as previous research showed that haplotypes contain more information than individual Singular Nucleotide Polymorphisms (SNPs). Although with advances in high-throughput sequencing technologies obtaining sequence information is becoming easier in today's laboratories, obtained sequences from current technologies always contain inevitable sequence errors and missing information. The SIH reconstruction problem can be formulated as bi-partitioning the input SNP fragment matrix into paternal and maternal sections to achieve minimum error correction (MEC) time; the problem that is proved to be NP-hard. Several heuristics or greedy algorithms have already been designed and implemented to solve this problem, most of them however (1) do not have the ability to handle data sets with high error rates and/or (2) can only handle binary input matrices. In this study, we introduce a Genetic Algorithm (GA) based method, named GAHap, to reconstruct SIHs with lowest MEC times. GAHap is equipped with a well-designed fitness function to obtain better reconstruction rates. GAHap is also compared with existing methods to show its ability in generating highly reliable solutions.

[1]  Leo van Iersel,et al.  The Complexity of the Single Individual SNP Haplotyping Problem , 2005, Algorithmica.

[2]  Filippo Geraci,et al.  A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem , 2010, Bioinform..

[3]  Luonan Chen,et al.  A Markov chain model for haplotype assembly from SNP fragments. , 2006, Genome informatics. International Conference on Genome Informatics.

[4]  Marco Pellegrini,et al.  A Fast and Accurate Heuristic for the Single Individual SNP Haplotyping Problem with Many Gaps, High Reading Error Rate and Low Coverage , 2007, WABI.

[5]  Giuseppe Lancia,et al.  Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem , 2002, WABI.

[6]  Wei Zhang,et al.  Minimum Conflict Individual Haplotyping from SNP Fragments and Related Genotype , 2006, Evolutionary bioinformatics online.

[7]  Asim Munawar,et al.  A Survey: Genetic Algorithms and the Fast Evolving World of Parallel Computing , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.

[8]  Jong Hyun Kim,et al.  Haplotype Reconstruction from SNP Alignment , 2004, J. Comput. Biol..

[9]  Tim Hubbard Finishing the euchromatic sequence of the human genome , 2004 .

[10]  Vineet Bafna,et al.  HapCUT: an efficient and accurate algorithm for the haplotype assembly problem , 2008, ECCB.

[11]  Eugene W. Myers,et al.  A Dataset Generator for Whole Genome Shotgun Sequencing , 1999, ISMB.

[12]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[13]  F. Geraci,et al.  SpeedHap: An Accurate Heuristic for the Single Individual SNP Haplotyping Problem with Many Gaps, High Reading Error Rate and Low Coverage , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  J. Stephens,et al.  Haplotype Variation and Linkage Disequilibrium in 313 Human Genes , 2001, Science.

[15]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[16]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[17]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[18]  Ying Wang,et al.  A clustering algorithm based on two distance functions for MEC model , 2007, Comput. Biol. Chem..

[19]  Alessandro Panconesi,et al.  Fast Hare: A Fast Heuristic for Single Individual SNP Haplotype Reconstruction , 2004, WABI.

[20]  Jianxin Wang,et al.  An Improved (and Practical) Parameterized Algorithm for the Individual Haplotyping Problem MFR with Mate-Pairs , 2007, Algorithmica.

[21]  Leo van Iersel,et al.  On the Complexity of Several Haplotyping Problems , 2005, WABI.

[22]  Giuseppe Lancia,et al.  A polynomial case of the parsimony haplotyping problem , 2006, Oper. Res. Lett..

[23]  Eleazar Eskin,et al.  Optimal algorithms for haplotype assembly from whole-genome sequence data , 2010, Bioinform..

[24]  Bin Fu,et al.  Linear Time Probabilistic Algorithms for the Singular Haplotype Reconstruction Problem from SNP Fragments , 2007, APBC.

[25]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[26]  Giuseppe Lancia,et al.  Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms , 2004, INFORMS J. Comput..

[27]  Giuseppe Lancia,et al.  Polynomial and APX-hard cases of the individual haplotyping problem , 2005, Theor. Comput. Sci..

[28]  Russell Schwartz,et al.  Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem , 2002, Briefings Bioinform..

[29]  Antonio Peregrín,et al.  Efficient Distributed Genetic Algorithm for Rule Extraction , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[30]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[31]  Xiang-Sun Zhang,et al.  Haplotype reconstruction from SNP fragments by minimum error correction , 2005, Bioinform..

[32]  Russell Schwartz,et al.  SNPs Problems, Complexity, and Algorithms , 2001, ESA.