Settling the Intractability of Multiple Alignment

Multiple alignment is a core problem in computational biology that has received much attention over the years, both in the line of heuristics and hardness results. In most expositions of the problem it is referred to as NP-hard and references are given to one of the available hardness results. However, previous to this paper not even the most elementary variation of the problem, multiple alignment under the unit metric, had been proved hard. The aim of this paper is to settle the NP-hardness of the most common variations of multiple alignment. The following variations are shown NP-hard for all metrics over binary or larger alphabets: MULTIPLE ALIGNMENT WITH SP-SCORE, STAR ALIGNMENT, and TREE ALIGNMENT (for a given phylogeny). In addition, NP-hardness results are provided for CONSENSUS PATTERNS and SUBSTRING PARSIMONY.

[1]  Isaac Elias Settling the Intractability of Multiple Alignment , 2003, ISAAC.

[2]  Piotr Berman,et al.  On the Approximation Properties of Independent Set Problem in Degree 3 Graphs , 1999, WADS.

[3]  Winfried Just,et al.  Computational Complexity of Multiple Sequence Alignment with SP-Score , 2001, J. Comput. Biol..

[4]  Rolf Niedermeier,et al.  On the Parameterized Intractability of CLOSEST SUBSTRINGsize and Related Problems , 2002, STACS.

[5]  Tao Jiang,et al.  Approximation algorithms for tree alignment with a given phylogeny , 1996, Algorithmica.

[6]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[7]  Paola Bonizzoni,et al.  The complexity of multiple sequence alignment with SP-score that is a metric , 2001, Theor. Comput. Sci..

[8]  Bin Ma,et al.  Finding Similar Regions in Many Sequences , 2002, J. Comput. Syst. Sci..

[9]  Bin Ma,et al.  Finding similar regions in many strings , 1999, STOC '99.

[10]  Mihalis Yannakakis,et al.  Optimization, approximation, and complexity classes , 1991, STOC '88.

[11]  Jeong Seop Sim,et al.  The consensus string problem for a metric is NP-complete , 2003, J. Discrete Algorithms.

[12]  Tao Jiang,et al.  A more efficient approximation scheme for tree alignment , 1997, RECOMB '97.

[13]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[14]  Mathieu Blanchette,et al.  Algorithms for phylogenetic footprinting , 2001, RECOMB.

[15]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.