New algorithms for efficient parallel string comparison

In this paper, we show new parallel algorithms for a set of classical string comparison problems: computation of string alignments, longest common subsequences (LCS) or edit distances, and longest increasing subsequence computation. These problems have a wide range of applications, in particular in computational biology and signal processing. We discuss the scalability of our new parallel algorithms in computation time, in memory, and in communication. Our new algorithms are based on an efficient parallel method for (min,+)-multiplication of distance matrices. The core result of this paper is a scalable parallel algorithm for multiplying implicit simple unit-Monge matrices of size <i>n</i> x <i>n</i> on <i>p</i> processors using time <i>O</i>( <i>n</i> log <i>n</i> ‾ <i>p</i>). communication <i>O</i>(<i>n</i> log <i>p</i>) ‾ <i>p</i>) and <i>O</i>(log <i>p</i>) supersteps. This algorithm allows us to implement scalable LCS computation for two strings of length <i>n</i> using time <i>O</i>(<i>n</i><sup>2</sup> ‾ <i>p</i>) and communication <i>O</i>(<i>n</i> ‾ √ <i>p</i>), requiring local memory of size <i>O</i>(<i>n</i> ‾ √ <i>p</i>) on each processor. Furthermore, our algorithm can be used to obtain the first generally work-scalable algorithm for computing the longest increasing subsequence (LIS). Our algorithm for LIS computation requires computation <i>O</i>(<i>n</i> log<sup>2</sup> <i>n</i> ‾ <i>p</i>), communication <i>O</i>(<i>n</i> log <i>p</i>)/ <i>p</i>), and <i>O</i>(log<sup>2</sup> <i>p</i>) supersteps for computing the LIS of a sequence of length <i>n</i>. This is within a log n factor of work-optimality for the LIS problem, which can be solved sequentially in time O(<i>n</i> log <i>n</i>) in the comparison-based model. Our LIS algorithm is also within a log <i>p</i>-factor of achieving perfectly scalable communication and furthermore has perfectly scalable memory size requirements of <i>O</i>(<i>n</i> ‾ <i>p</i>) per processor.

[1]  Gad M. Landau,et al.  Incremental String Comparison , 1998, SIAM J. Comput..

[2]  Edson Cáceres,et al.  A Coarse-Grained Parallel Algorithm for the All-Substrings Longest Common Subsequence Problem , 2006, Algorithmica.

[3]  Alexander Tiskin Semi-local longest common subsequences in subquadratic time , 2008, J. Discrete Algorithms.

[4]  Alexander Tiskin,et al.  Periodic String Comparison , 2009, CPM.

[5]  David Semé A CGM Algorithm Solving the Longest Increasing Subsequence Problem , 2006, ICCSA.

[6]  William F. McColl,et al.  Scalable Computing , 1995, Computer Science Today.

[7]  Alexander Tiskin,et al.  Fast Distance Multiplication of Unit-Monge Matrices , 2010, SODA '10.

[8]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[9]  Mikhail J. Atallah,et al.  Efficient Parallel Algorithms for String Editing and Related Problems , 1990, SIAM J. Comput..

[10]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[11]  Alexander Tiskin,et al.  Semi-local String Comparison: Algorithmic Techniques and Applications , 2007, Math. Comput. Sci..

[12]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[13]  Rainer E. Burkard,et al.  Perspectives of Monge Properties in Optimization , 1996, Discret. Appl. Math..

[14]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[15]  Edson Cáceres,et al.  An all-substrings common subsequence algorithm , 2008, Discret. Appl. Math..

[16]  Alok Aggarwal,et al.  Geometric Applications of a Matrix Searching Algorithm , 1986, Symposium on Computational Geometry.

[17]  Akihiro Fujiwara,et al.  A Cost Optimal Parallel Algorithm for Patience Sorting , 2006, Parallel Process. Lett..

[18]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[19]  Alexander Tiskin,et al.  Efficient Parallel String Comparison , 2007, PARCO.

[20]  Jeanette P. Schmidt,et al.  All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings , 1998, SIAM J. Comput..

[21]  Sergey Bereg,et al.  Enumerating longest increasing subsequences and patience sorting , 2000, Inf. Process. Lett..

[22]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[23]  Alexander Tiskin,et al.  Parallel Longest Increasing Subsequences in Scalable Time and Memory , 2009, PPAM.

[24]  Rob H. Bisseling,et al.  Parallel scientific computation - a structured approach using BSP and MPI , 2004 .

[25]  J. Hammersley A few seedlings of research , 1972 .

[26]  G. Szekeres,et al.  A combinatorial problem in geometry , 2009 .