论文信息 - A hyper-heuristic for the Longest Common Subsequence problem

A hyper-heuristic for the Longest Common Subsequence problem

The Longest Common Subsequence Problem is the problem of finding a longest string that is a subsequence of every member of a given set of strings. It has applications in FPGA circuit minimization, data compression, and bioinformatics, among others. The problem is NP-hard in its general form, which implies that no exact polynomial-time algorithm currently exists for the problem. Consequently, inexact algorithms have been proposed to obtain good, but not necessarily optimal, solutions in an affordable time. In this paper, a hyper-heuristic algorithm incorporated within a constructive beam search is proposed for the problem. The proposed hyper-heuristic is based on two basic heuristic functions, one of which is new in this paper, and determines dynamically which one to use for a given problem instance. The proposed algorithm is compared with state-of-the-art algorithms on simulated and real biological sequences. Extensive experimental reveals that the proposed hyper-heuristic is superior to the state-of-the-art methods with respect to the solution quality and the running-time.

Sayyed Rasoul Mousavi | Farzaneh Tabataba | S. R. Mousavi | Farzaneh Tabataba

[1] Alain Guénoche,et al. Supersequences of Masks for Oligo-chips , 2004, J. Bioinform. Comput. Biol..

[2] Francis Y. L. Chin,et al. Performance analysis of some simple heuristics for computing longest common subsequences , 1994, Algorithmica.

[3] Christian Blum,et al. Metaheuristics in combinatorial optimization: Overview and conceptual comparison , 2003, CSUR.

[4] Hiroshi Imai,et al. The Longest Common Subsequence Problem for Small Alphabet Size Between Many Strings , 1992, ISAAC.

[5] Qingguo Wang,et al. A Fast Heuristic Search Algorithm for Finding the Longest Common Subsequence of Multiple Strings , 2010, AAAI.

[6] Arindam Banerjee,et al. Clickstream clustering using weighted longest common subsequences , 2001 .

[7] Sayyed Rasoul Mousavi,et al. An improved algorithm for the longest common subsequence problem , 2012, Comput. Oper. Res..

[8] Todd Easton,et al. A Specialized Branching and Fathoming Technique for The Longest Common Subsequence Problem , 2007 .

[9] L. Holm,et al. The Pfam protein families database , 2005, Nucleic Acids Res..

[10] Joseph B. Kruskal,et al. Time Warps, String Edits, and Macromolecules , 1999 .

[11] Majid Sarrafzadeh,et al. Area-efficient instruction set synthesis for reconfigurable system-on-chip designs , 2004, Proceedings. 41st Design Automation Conference, 2004..

[12] 田中俊典. National Center for Biotechnology Information (NCBI) , 2012 .

[13] Robert W. Irving,et al. Two Algorithms for the Longest Common Subsequence of Three (or More) Strings , 1992, CPM.

[14] Timos K. Sellis,et al. Multiple-query optimization , 1988, TODS.

[15] Todd Easton,et al. A large neighborhood search heuristic for the longest common subsequence problem , 2008, J. Heuristics.

[16] David Eppstein,et al. Sparse dynamic programming II: convex and concave cost functions , 1992, JACM.

[17] Graham Kendall,et al. Hyper-Heuristics: An Emerging Direction in Modern Search Technology , 2003, Handbook of Metaheuristics.

[18] Chang-Biau Yang,et al. Fast Algorithms for Finding the Common Subsequence of Multiple Sequences , 2004 .

[19] Tao Jiang,et al. On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1995, SIAM J. Comput..

[20] Alfred V. Aho,et al. Data Structures and Algorithms , 1983 .

[21] David Sankoff,et al. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[22] Cameron Bruce Fraser,et al. Subsequences and Supersequences of Strings , 1995 .

[23] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[24] Qingguo Wang,et al. A Fast Multiple Longest Common Subsequence (MLCS) Algorithm , 2011, IEEE Transactions on Knowledge and Data Engineering.

[25] Qingguo Wang,et al. An Efficient Parallel Algorithm for the Multiple Longest Common Subsequence (MLCS) Problem , 2008, 2008 37th International Conference on Parallel Processing.

[26] Daniel S. Hirschberg,et al. A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[27] Manuel López-Ibáñez,et al. Beam search for the longest common subsequence problem , 2009, Comput. Oper. Res..

[28] Kang Ning,et al. Deposition and extension approach to find longest common subsequence for thousands of long sequences , 2010, Comput. Biol. Chem..

[29] Paola Bonizzoni,et al. Experimenting an approximation algorithm for the LCS , 2001, Discret. Appl. Math..

[30] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[31] Tao Jiang,et al. On the Approximation of Shortest Common Supersequences and Longest Common Subsequences , 1994, SIAM J. Comput..

[32] Bin Ma,et al. A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[33] Robert D. Finn,et al. The Pfam protein families database , 2004, Nucleic Acids Res..

[34] R. Ravi,et al. Computing Similarity between RNA Strings , 1996, CPM.

[35] Shyong Jian Shyu,et al. Finding the longest common subsequence for multiple biological sequences by ant colony optimization , 2009, Comput. Oper. Res..

[36] Yixin Chen,et al. A fast parallel algorithm for finding the longest common sequence of multiple biosequences , 2006, BMC Bioinformatics.

[37] Christian Blum,et al. Probabilistic Beam Search for the Longest Common Subsequence Problem , 2007, SLS.

[38] James A. Storer,et al. Data Compression: Methods and Theory , 1987 .

[39] M. W. Du,et al. Computing a longest common subsequence for a set of strings , 1984, BIT.