A linear size index for approximate pattern matching

This paper revisits the problem of indexing a text S[1..n] for pattern matching with up to k errors. A naive solution either has a worst-case matching time complexity of @W(m^k) or requires @W(n^k) space, where m is the length of the pattern. Devising a solution with better performance has been a challenge until Cole et al. (2004) [5] showed an O(nlog^kn)-space index that can support k-error matching in O(m+occ+log^knloglogn) time, where occ is the number of occurrences. Motivated by the indexing of long sequences like DNA, we have investigated the feasibility of devising a linear-size index that still has a time complexity linear in pattern length. This paper in particular presents an O(n)-space index that supports k-error matching in O(m+occ+(logn)^k^(^k^+^1^)loglogn) worst-case time. This index can be further compressed from O(n) words into O(n) bits with a slight increase in the time complexity.

[1]  Michael T. Goodrich,et al.  Range Searching Over Tree Cross Products , 2000, ESA.

[2]  Gad M. Landau,et al.  Text Indexing and Dictionary Matching with One Error , 2000, J. Algorithms.

[3]  Gonzalo Navarro,et al.  A metric index for approximate string matching , 2002, Theor. Comput. Sci..

[4]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[5]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[6]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[7]  Tak Wah Lam,et al.  Improved Approximate String Matching Using Compressed Suffix Data Structures , 2005, ISAAC.

[8]  Wing-Kai Hon,et al.  Approximate String Matching Using Compressed Suffix Arrays , 2004, CPM.

[9]  Archie L. Cobbs,et al.  Fast Approximate Matching using Suffix Trees , 1995, CPM.

[10]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[11]  Richard Cole,et al.  Dictionary matching and indexing with errors and don't cares , 2004, STOC '04.

[12]  Tak Wah Lam,et al.  Improved Approximate String Matching Using Compressed Suffix Data Structures , 2007, Algorithmica.

[13]  Johannes Nowak,et al.  Text indexing with errors , 2007, J. Discrete Algorithms.

[14]  Gonzalo Navarro,et al.  A Hybrid Indexing Method for Approximate String Matching , 2007 .

[15]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[16]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.