Multiple pattern matching in LZW compressed text

We address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns by simulating the move of the Aho-Corasick (1975) pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach (see Journal of Computer and System Sciences, vol.52, p.299-307, 1996) finds only the first occurrence of a single pattern. The new algorithm runs in O(n+m/sup 2/+r/sub a/) time using O(n+m/sup 2/) space, where n is the length of the compressed text, m is the length of the total length of the patterns, and r is the number of occurrences of the patterns. We implemented a simple version of the algorithm, and showed that it is approximately twice faster than a decompression followed by a search using the Aho-Corasick machine.

[1]  Wojciech Plandowski,et al.  Eecient Algorithms for Lempel-ziv Encoding , 1996 .

[2]  Gad M. Landau,et al.  Efficient pattern matching with scaling , 1990, SODA '90.

[3]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[4]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[5]  Gary Benson,et al.  Two-dimensional periodicity and its applications , 1992, SODA '92.

[6]  Wojciech Rytter,et al.  An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions , 1997, Nord. J. Comput..

[7]  Wojciech Plandowski,et al.  Efficient algorithms for Lempel-Ziv encoding , 1996 .

[8]  Gary Benson,et al.  Let sleeping files lie: pattern matching in Z-compressed files , 1994, SODA '94.

[9]  Udi Manber A text compression scheme that allows fast searching directly in the compressed file , 1997, TOIS.

[10]  Gary Benson,et al.  Efficient two-dimensional compressed matching , 1992, Data Compression Conference, 1992..

[11]  Mikkel Thorup,et al.  String Matching in Lempel—Ziv Compressed Strings , 1998, Algorithmica.

[12]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[13]  Udi Manber,et al.  A text compression scheme that allows fast searching directly in the compressed file , 1994, TOIS.

[14]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[15]  Ayumi Shinohara,et al.  An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs , 1997, CPM.