Sequence comparison on the connection machine

We give two parallel algorithms for sequence comparison on the Connection Machine 2 (CM-2). The specific comparison measure we compute is the edit distance: given a finite alphabet ∑ and two input sequences X ϵ ∑+ and Y ϵ ∑+ the edit distance d(X,Y) is the minimum cost of transforming X into Y via a series of weighted insertions, deletions and substitutions of characters. The edit distance comparison measure is equivalent to or subsumes a broad range of well known sequence comparison measures. The CM-2 is very fast at performing parallel prefix operations. Our contribution consists of casting the problem in terms of these operations. Our first algorithm computes d(X,Y) using N processors and O(M S) time units, where M = min(|X|,||Y|) + 1, N = max(|X|,|Y|) + 1 and S is the time required for a parallel prefix operation. The second algorithm computes d(X,Y) using NM processors and O((log N log M)(S + R)) time units, where R is the time for a ‘router’ communication step—one in which each processor is able to read data, in parallel, from the memory of any other processor. Our algorithms can also be applied to several variants of the problem, such as subsequence comparisons, and one—many and many-many comparisons on 'sequence databases'.

[1]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[2]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[3]  Henry Fuchs,et al.  Optimal surface reconstruction from planar contours , 1977, SIGGRAPH.

[4]  David G. Herr,et al.  On a Statistical Model of Strand and Westwater for the Numerical Solution of a Fredholm Integral Equation of the First Kind , 1974, JACM.

[5]  Peter H. Sellers,et al.  The Theory and Computation of Evolutionary Distances: Pattern Recognition , 1980, J. Algorithms.

[6]  Wojciech Rytter,et al.  Efficient parallel algorithms , 1988 .

[7]  Guy E. Blelloch,et al.  Scan primitives and parallel vector models , 1989 .

[8]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[9]  Alok Aggarwal,et al.  Notes on searching in multidimensional monotone arrays , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[10]  Oliver A. McBryan,et al.  Hypercube Algorithms and Implementations , 1985, PPSC.

[11]  Oscar H. Ibarra,et al.  Hypercube Algorithms for Some String Comparison Problems , 1988 .

[12]  Robert A. Wagner,et al.  Parallelization of the Dynamic Programming Algorithm for Comparison of Sequences , 1987, International Conference on Parallel Processing.

[13]  Mikhail J. Atallah,et al.  Efficient Parallel Algorithms for String Editing and Related Problems , 1990, SIAM J. Comput..

[14]  Sartaj Sahni,et al.  String Editing on an SIMD Hypercube Multicomputer , 1990, J. Parallel Distributed Comput..

[15]  T. Mathies A fast parallel algorithm to determine edit distance , 1988 .

[16]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[17]  E. Gilbert Gray codes and paths on the N-cube , 1958 .