Using Edit Distance in Point-Pattern Matching

Edit distanceis a powerful measure of similarity in string matching, measuring the minimum amount of insertions, deletions, and substitutions to convert a string into anoth er string. This measure is often contrasted with time warping in speech processing, that measures how close two trajectories are by allowing compression and expansion operations on time scale. Time warping can be easily generalized to measure the similarity between 1D point-patterns (ascending lists of real values), as the difference between ith and (i 1)th points in a point-pattern can be considered as the value of a trajectory at the time i. However, we show that edit distance is more natural choice, and derive a measure by calculating the minimum amount of space needed to insert and delete between points to convert a point-pattern into another. We show that this measure defines a metric. We also define a substitution operation such that the distance calculation automatically separate s the points into matching and mismatching points. The algorithms are based on dynamic programming. The main motivation for these methods is two and higher dimensional point-pattern matching, and therefore we generalize these methods into the 2D case, and show that this generalization leads to an NP-complete problem. There is also applications for the 1D case; we discuss shortly the matching of tree ring sequences in dendrochronology.

[1]  Gonzalo Navarro,et al.  Approximate Matching of Run-Length Compressed Strings , 2001, CPM.

[2]  Shmuel Tomi Klein,et al.  Fuzzy Hamming Distance: A New Dissimilarity Measure , 2001, CPM.

[3]  Gad M. Landau,et al.  Matching for Run-Length Encoded Strings , 1999, J. Complex..

[4]  Leonidas J. Guibas,et al.  Discrete Geometric Shapes: Matching, Interpolation, and Approximation , 2000, Handbook of Computational Geometry.

[5]  Tatsuya Akutsu,et al.  Matching of Spots in 2D Electrophoresis Images. Point Matching Under Non-uniform Distortions , 1999, CPM.

[6]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[7]  Ronald L. Graham,et al.  Some NP-complete geometric problems , 1976, STOC '76.

[8]  Carola Wenk,et al.  Matching 2D patterns of protein spots , 1998, SCG '98.

[9]  János Csirik,et al.  An Improved Algorithm for Computing the Edit Distance of Run-Length Coded Strings , 1995, Inf. Process. Lett..

[10]  Gad M. Landau,et al.  Edit distance of run-length encoded strings , 2002, Inf. Process. Lett..

[11]  P. C. van Deusen A dynamic program for cross-dating tree rings , 1990 .

[12]  Carola Wenk Applying an Edit Distance to the Matching of Tree Ring Sequences in Dendrochronology , 1999, CPM.

[13]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[14]  R D Appel,et al.  Melanie II – a third‐generation software package for analysis of two‐dimensional electrophoresis images: II. Algorithms , 1997, Electrophoresis.