The Complexity of the Single Individual SNP Haplotyping Problem

We present several new results pertaining to haplotyping. These results concern the combinatorial problem of reconstructing haplotypes from incomplete and/or imperfectly sequenced haplotype fragments. We consider the complexity of the problems Minimum Error Correction (MEC) and Longest Haplotype Reconstruction (LHR) for different restrictions on the input data. Specifically, we look at the gapless case, where every row of the input corresponds to a gapless haplotype-fragment, and the 1-gap case, where at most one gap per fragment is allowed. We prove that MEC is APX-hard in the 1-gap case and still NP-hard in the gapless case. In addition, we question earlier claims that MEC is NP-hard even when the input matrix is restricted to being completely binary. Concerning LHR, we show that this problem is NP-hard and APX-hard in the 1-gap case (and thus also in the general case), but is polynomial time solvable in the gapless case.

[1]  C. Papadimitriou,et al.  Segmentation problems , 2004 .

[2]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[3]  D J Hand REVIEW OF DATA MINING , 1998 .

[4]  Shibu Yooseph,et al.  A Survey of Computational Methods for Determining Haplotypes , 2002, Computational Methods for SNPs and Haplotype Inference.

[5]  Giuseppe Lancia,et al.  Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms , 2004, INFORMS J. Comput..

[6]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[7]  Rafail Ostrovsky,et al.  Polynomial-time approximation schemes for geometric min-sum median clustering , 2002, JACM.

[8]  Paola Bonizzoni,et al.  The Haplotyping problem: An overview of computational models and solutions , 2003, Journal of Computer Science and Technology.

[9]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[10]  P. Berman,et al.  On Some Tighter Inapproximability Results , 1998, Electron. Colloquium Comput. Complex..

[11]  Viggo Kann,et al.  Hardness of Approximating Problems on Cubic Graphs , 1997, CIAC.

[12]  Jon M. Kleinberg,et al.  A Microeconomic View of Data Mining , 1998, Data Mining and Knowledge Discovery.

[13]  Noga Alon,et al.  On Two Segmentation Problems , 1999, J. Algorithms.

[14]  Marek Karpinski,et al.  On Some Tighter Inapproximability Results (Extended Abstract) , 1999, ICALP.

[15]  Ming Li,et al.  On the k-Closest Substring and k-Consensus Pattern Problems , 2004, CPM.

[16]  Mihalis Yannakakis,et al.  Optimization, approximation, and complexity classes , 1991, STOC '88.

[17]  Alessandro Panconesi,et al.  Fast Hare: A Fast Heuristic for Single Individual SNP Haplotype Reconstruction , 2004, WABI.

[18]  Han Hoogeveen,et al.  Non-Approximability Results for Scheduling Problems with Minsum Criteria , 1998, INFORMS J. Comput..

[19]  Giuseppe Lancia,et al.  A polynomial case of the parsimony haplotyping problem , 2006, Oper. Res. Lett..

[20]  Giuseppe Lancia,et al.  Polynomial and APX-hard cases of the individual haplotyping problem , 2005, Theor. Comput. Sci..

[21]  Giuseppe Lancia,et al.  Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem , 2002, WABI.

[22]  Russell Schwartz,et al.  SNPs Problems, Complexity, and Algorithms , 2001, ESA.

[23]  Fanica Gavril,et al.  Testing for Equality Between Maximum Matching and Minimum Node Covering , 1977, Inf. Process. Lett..

[24]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[25]  Friedhelm Meyer auf der Heide,et al.  Proceedings of the 9th Annual European Symposium on Algorithms , 2001 .