论文信息 - Hardness Results on Local Multiple Alignment of Biological Sequences

Hardness Results on Local Multiple Alignment of Biological Sequences

This paper studies the local multiple alignment problem, which is, given protein or DNA sequences, to locate a region (i.e., a substring) of fixed length from each sequence so that the score determined from the set of regions is optimized. We consider the following scoring schemes: the relative entropy score (i.e., average information content), the sum-of-pairs score and a relative entropy-like score introduced by Li, et al. We prove that multiple local alignment is NP-hard under each of these scoring schemes. In particular, we prove that multiple local alignment is APX-hard under relative entropy scoring. It implies that unless P =NP there is no polynomial time algorithm whose worst case approximation error can be arbitrarily specified(precisely, a polynomial time approximation scheme). Several related theoretical results are also provided.

[1] P Horton. A branch and bound algorithm for local multiple alignment. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[2] Bin Ma,et al. Finding Similar Regions in Many Sequences , 2002, J. Comput. Syst. Sci..

[3] G. Stormo,et al. Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[4] Mihalis Yannakakis,et al. Optimization, approximation, and complexity classes , 1991, STOC '88.

[5] Paul Horton,et al. An Upper Bound on the Hardness of Exact Matrix Based Motif Discovery , 2005, CPM.

[6] Paul Horton. Tsukuba BB: A Branch and Bound Algorithm for Local Multiple Alignment of DNA and Protein Sequences , 2001, J. Comput. Biol..

[7] Giorgio Ausiello,et al. Theoretical Computer Science Approximate Solution of Np Optimization Problems * , 2022 .

[8] G. Stormo. Consensus patterns in DNA. , 1990, Methods in enzymology.

[9] Tao Jiang,et al. On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[10] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[11] Sean R. Eddy,et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[12] Giorgio Gambosi,et al. Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[13] M. A. McClure,et al. A Comparative Analysis of Computational Motif-Detection Methods , 1998, Pacific Symposium on Biocomputing.

[14] J. Thompson,et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[15] D. Gusfield. Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993 .

[16] Eugene L. Lawler,et al. Approximation Algorithms for Multiple Sequence Alignment , 1994, Theor. Comput. Sci..

[17] Carsten Lund,et al. Proof verification and the hardness of approximation problems , 1998, JACM.

[18] A. A. Reilly,et al. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[19] Hiroki Arimura,et al. On approximation algorithms for local multiple alignment , 2000, RECOMB '00.

[20] Jun S. Liu,et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.