Linear Time Probabilistic Algorithms for the Singular Haplotype Reconstruction Problem from SNP Fragments

In this paper, we develop a probabilistic model to approach two realistic scenarios regarding the singular haplotype reconstruction problem--the incompleteness and inconsistency that occurred in the DNA sequencing process to generate the input haplotype fragments, and the common practice used to generate synthetic data in experimental algorithm studies. We design three algorithms in the model that can reconstruct the two unknown haplotypes from the given matrix of haplotype fragments with provable high probability and in linear time in the size of the input matrix. We also present experimental results that conform with the theoretical efficient performance of those algorithms. The software of our algorithms is available for public access and for real-time on-line demonstration.

[1]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[2]  Alessandro Panconesi,et al.  Fast Hare: A Fast Heuristic for Single Individual SNP Haplotype Reconstruction , 2004, WABI.

[3]  Xiang-Sun Zhang,et al.  Haplotype reconstruction from SNP fragments by minimum error correction , 2005, Bioinform..

[4]  Giuseppe Lancia,et al.  Polynomial and APX-hard cases of the individual haplotyping problem , 2005, Theor. Comput. Sci..

[5]  Jianxin Wang,et al.  An Improved (and Practical) Parameterized Algorithm for the Individual Haplotyping Problem MFR with Mate-Pairs , 2007, Algorithmica.

[6]  Leo van Iersel,et al.  On the Complexity of Several Haplotyping Problems , 2005, WABI.

[7]  Giuseppe Lancia,et al.  A polynomial case of the parsimony haplotyping problem , 2006, Oper. Res. Lett..

[8]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[9]  Russell Schwartz,et al.  Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem , 2002, Briefings Bioinform..

[10]  Dan Gusfield,et al.  A Practical Algorithm for Optimal Inference of Haplotypes from Diploid Populations , 2000, ISMB.

[11]  K K Kidd,et al.  Sequence variability and candidate gene analysis in complex disease: association of mu opioid receptor gene variation with substance dependence. , 2000, Human molecular genetics.

[12]  Jong Hyun Kim,et al.  Haplotype Reconstruction from SNP Alignment , 2004, J. Comput. Biol..

[13]  Marco Pellegrini,et al.  A Fast and Accurate Heuristic for the Single Individual SNP Haplotyping Problem with Many Gaps, High Reading Error Rate and Low Coverage , 2007, WABI.

[14]  Luonan Chen,et al.  A Markov chain model for haplotype assembly from SNP fragments. , 2006, Genome informatics. International Conference on Genome Informatics.

[15]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[16]  Werner Dubitzky,et al.  Briefings in bioinformatics. , 2009, Briefings in bioinformatics.

[17]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[18]  Wei Zhang,et al.  Minimum Conflict Individual Haplotyping from SNP Fragments and Related Genotype , 2006, Evolutionary bioinformatics online.

[19]  J. van Leeuwen,et al.  Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[20]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[21]  Giuseppe Lancia,et al.  Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms , 2004, INFORMS J. Comput..

[22]  Bin Ma,et al.  On the closest string and substring problems , 2002, JACM.

[23]  Giuseppe Lancia,et al.  Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem , 2002, WABI.

[24]  K. Weiss,et al.  Linkage disequilibrium mapping of complex disease: fantasy or reality? , 1998, Current opinion in biotechnology.