Approximate Matching in the L1 Metric

Approximate matching is one of the fundamental problems in pattern matching, and a ubiquitous problem in real applications. The Hamming distance is a simple and well studied example of approximate matching, motivated by typing, or noisy channels. Biological and image processing applications assign a different value to mismatches of different symbols. We consider the problem of approximate matching in the L1 metric – the k-L1-distance problem. Given text T=t0,...,tn−1 and pattern P=p0,...,pm−1 strings of natural number, and a natural number k, we seek all text locations i where the L1 distance of the pattern from the length m substring of text starting at i is not greater than k, i.e. $\sum_{j=0}^{m-1} |{t}_{i+j} - {p}_{j}| \leq k$. We provide an algorithm that solves the k-L1-distance problem in time $O(n\sqrt{k\log k})$. The algorithm applies a bounded divide-and-conquer approach and makes novel uses of non-boolean convolutions.

[1]  Moshe Lewenstein,et al.  Function Matching: Algorithms, Applications, and a Lower Bound , 2003, ICALP.

[2]  Jeffrey S. Racine,et al.  Entropy and predictability of stock market returns , 2002 .

[3]  Gad M. Landau,et al.  Efficient String Matching with k Mismatches , 2018, Theor. Comput. Sci..

[4]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[5]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[6]  Uzi Vishkin,et al.  Highly parallelizable problems , 1989, STOC '89.

[7]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[8]  Z. Galil,et al.  Combinatorial Algorithms on Words , 1985 .

[9]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[10]  Amihood Amir,et al.  Efficient 2-dimensional approximate matching of non-rectangular figures , 1991, SODA '91.

[11]  Costas S. Iliopoulos,et al.  Faster Algorithms for delta, gamma-Matching and Related Problems , 2005, CPM.

[12]  Zvi Galil,et al.  Open Problems in Stringology , 1985 .

[13]  Moshe Lewenstein,et al.  Closest Pair Problems in Very High Dimensions , 2004, ICALP.

[14]  Moshe Lewenstein,et al.  Overlap matching , 2001, SODA '01.

[15]  Ely Porat,et al.  Swap and mismatch edit distance , 2004, Algorithmica.

[16]  Richard Cole,et al.  Verifying candidate matches in sparse and wildcard matching , 2002, STOC '02.

[17]  Z Galil,et al.  Improved string matching with k mismatches , 1986, SIGA.

[18]  Edward J. Coyle,et al.  Perceptual Issues in Music Pattern Recognition: Complexity of Rhythm and Key Finding , 2001, Comput. Humanit..

[19]  Moshe Lewenstein,et al.  Faster algorithms for string matching with k mismatches , 2000, SODA '00.

[20]  Howard J. Karloff Fast Algorithms for Approximately Counting Mismatches , 1993, Inf. Process. Lett..

[21]  Maynard V. Olson A Time to Sequence , 1995, Science.

[22]  Luca Malagnini,et al.  Ground-Motion Scaling in the Apennines (Italy) , 2000 .

[23]  Karl R. Abrahamson Generalized String Matching , 1987, SIAM J. Comput..

[24]  Graham Cormode,et al.  The string edit distance matching problem with moves , 2002, SODA '02.