Efficient algorithms for finding a longest common increasing subsequence

We study the problem of finding a longest common increasing subsequence (LCIS) of multiple sequences of numbers. The LCIS problem is a fundamental issue in various application areas, including the whole genome alignment. In this paper we give an efficient algorithm to find the LCIS of two sequences in $$O({\rm min}(r {\rm log} \ell, n \ell +r) {\rm log} {\rm log} n + Sort(n))$$ time where n is the length of each sequence andr is the number of ordered pairs of positions at which the two sequences match, ℓ is the length of the LCIS, and Sort(n) is the time to sort n numbers. For m sequences wherem ≥ 3, we find the LCIS in $$O({\rm min}(mr^2, r {\rm log}\ell {\rm log}^m r)+m\cdot $$ Sort(n)) time where r is the total number of m-tuples of positions at which the m sequences match. The previous results find the LCIS of two sequences in O(n2) and$$O(n\ell {\rm log} {\rm log} n+$$ Sort(n)) time. Our algorithm is faster when r is relatively small, e.g., for $$r < {\rm min}(n^2/({\rm log} \ell {\rm log}{\rm log} n), n\ell/{\rm log}\ell)$$.

[1]  Timothy J. Purcell Sorting and searching , 2005, SIGGRAPH Courses.

[2]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[3]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[4]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[5]  Jitender S. Deogun,et al.  EMAGEN: An Efficient Approach to Multiple Whole Genome Alignment , 2004, APBC.

[6]  Kun-Mao Chao,et al.  A fast algorithm for computing a longest common increasing subsequence , 2005, Inf. Process. Lett..

[7]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[8]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[9]  C. Schensted Longest Increasing and Decreasing Subsequences , 1961, Canadian Journal of Mathematics.

[10]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[11]  Peter van Emde Boas,et al.  Preserving order in a forest in less than logarithmic time , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[12]  George S. Lueker,et al.  Adding range restriction capability to dynamic data structures , 1985, JACM.

[13]  Sergey Bereg,et al.  Enumerating longest increasing subsequences and patience sorting , 2000, Inf. Process. Lett..

[14]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.