Approximate pattern matching in LZ77-compressed texts

Suppose we want to support approximate pattern matching in a text T 1 . . n ] whose LZ77 parse consists of z phrases. In this paper we show how, given that parse, we can preprocess T in O ( z log ? n ) time and space such that later, given a pattern P 1 . . m ] and an edit distance k, we can perform approximate pattern matching in O ( z min ? ( m k , m + k 4 ) + occ ) time and O ( z log ? n + m + occ ) space, where occ is the size of the output.

[1]  Ulf Leser,et al.  String Searching in Referentially Compressed Genomes , 2012, KDIR.

[2]  Richard Cole,et al.  Approximate string matching: a simpler faster algorithm , 2002, SODA '98.

[3]  Justin Zobel,et al.  Relative Lempel-Ziv Compression of Genomes for Large-Scale Storage and Retrieval , 2010, SPIRE.

[4]  Philip Bille,et al.  Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts , 2009, TALG.

[5]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[6]  Gad M. Landau,et al.  Random access to grammar-compressed strings , 2010, SODA '11.

[7]  Simon J. Puglisi,et al.  Faster Approximate Pattern Matching in Compressed Repetitive Texts , 2011, ISAAC.

[8]  Juha Kärkkäinen,et al.  A Faster Grammar-Based Self-index , 2011, LATA.

[9]  Gonzalo Navarro,et al.  Approximate string matching on Ziv-Lempel compressed text , 2003, J. Discrete Algorithms.

[10]  Igor Potapov,et al.  Real-time traversal in grammar-based compressed files , 2005, Data Compression Conference.

[11]  Abhi Shelat,et al.  The smallest grammar problem , 2005, IEEE Transactions on Information Theory.

[12]  Gonzalo Navarro,et al.  Improved Grammar-Based Compressed Indexes , 2012, SPIRE.

[13]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[14]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[15]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[16]  Ayumi Shinohara,et al.  An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs , 1997, CPM.

[17]  Alistair Moffat,et al.  Off-line dictionary-based compression , 1999, Proceedings of the IEEE.

[18]  Wojciech Rytter Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2003, Theor. Comput. Sci..

[19]  Pawel Gawrychowski,et al.  Beating $\mathcal{O}(nm)$ in Approximate LZW-Compressed Pattern Matching , 2013, ISAAC.

[20]  Markus Lohrey,et al.  Algorithmics on SLP-compressed strings: A survey , 2012, Groups Complex. Cryptol..

[21]  Pawel Gawrychowski,et al.  Pattern Matching in Lempel-Ziv Compressed Strings: Fast, Simple, and Deterministic , 2011, ESA.

[22]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[23]  Alexander Tiskin,et al.  Semi-local String Comparison: Algorithmic Techniques and Applications , 2007, Math. Comput. Sci..

[24]  Elad Verbin,et al.  Data Structure Lower Bounds on Random Access to Grammar-Compressed Strings , 2013, CPM.

[25]  Gary Benson,et al.  Efficient two-dimensional compressed matching , 1992, Data Compression Conference, 1992..