An Algorithm and Applications to Sequence Alignment with Weighted Constraints

Given two sequences S1, S2, and a constrained sequence C, a longest common subsequence of S1, S2 with restriction to C is called a constrained longest common subsequence of S1 and S2 with C. At the same time, an optimal alignment of S1, S2 with restriction to C is called a constrained pairwise sequence alignment of S1 and S2 with C. Previous algorithms have shown that the constrained longest common subsequence problem is a special case of the constrained pairwise sequence alignment problem, and that both of them can be solved in O(rnm) time, where r, n, and m represent the lengths of C, S1, and S2, respectively. In this paper, we extend the definition of constrained pairwise sequence alignment to a more flexible version, called weighted constrained pairwise sequence alignment, in which some constraints might be ignored. We first give an O(rnm)-time algorithm for solving the weighted constrained pairwise sequence alignment problem, then show that our extension can be adopted to solve some constraint-related problems that cannot be solved by previous algorithms for the constrained longest common subsequence problem or the constrained pairwise sequence alignment problem. Therefore, in contrast to previous results, our extension is a new and suitable model for sequence analysis.

[1]  Alfredo De Santis,et al.  A simple algorithm for the constrained sequence problems , 2004, Information Processing Letters.

[2]  Moshe Lewenstein,et al.  Constrained LCS: Hardness and Approximation , 2008, CPM.

[3]  Yin-Te Tsai,et al.  Constrained multiple sequence alignment tool development and its application to RNase family alignment , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[4]  Chang-Biau Yang,et al.  Sequence Alignment with Weighted Constraints , 2006 .

[5]  Hsing-Yen Ann,et al.  A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings , 2008, Inf. Process. Lett..

[6]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[7]  Chang-Biau Yang,et al.  The Better Alignment Among Output Alignments , 2007, METMBS.

[8]  Chin Lung Lu,et al.  A memory-efficient algorithm for multiple sequence alignment with constraints , 2004, Bioinform..

[9]  Abdullah N. Arslan Regular expression constrained sequence alignment , 2007, J. Discrete Algorithms.

[10]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[11]  Hsing-Yen Ann,et al.  Efficient algorithms for finding interleaving relationship between sequences , 2008, Inf. Process. Lett..

[12]  Chuan Yi Tang,et al.  Efficient algorithms for regular expression constrained sequence alignment , 2007, Inf. Process. Lett..

[13]  Michael S. Waterman,et al.  Chimeric alignment by dynamic programming: algorithm and biological uses , 1997, RECOMB '97.

[14]  Masaki Murata,et al.  Analysis and Improved Recognition of Protein Names Using Transductive SVM , 2008, J. Comput..

[15]  Hsing-Yen Ann,et al.  Dynamic programming algorithms for the mosaic longest common subsequence problem , 2007, Inf. Process. Lett..

[16]  Richard C. T. Lee,et al.  Systolic algorithms for the longest common subsequence problem , 1987 .

[17]  Yin-Te Tsai,et al.  The constrained longest common subsequence problem , 2003, Inf. Process. Lett..

[18]  Costas S. Iliopoulos,et al.  New efficient algorithms for the LCS and constrained LCS problems , 2008, Inf. Process. Lett..

[19]  Chang-Biau Yang,et al.  Near-Optimal Block Alignments , 2008, IEICE Trans. Inf. Syst..

[20]  Ömer Egecioglu,et al.  Algorithms For The Constrained Longest Common Subsequence Problems , 2005, Int. J. Found. Comput. Sci..