Compact Directed Acyclic Word Graphs for a Sliding Window

The suffix tree is a well-known and widely-studied data structure that is highly useful for string matching. The suffix tree of a string w can be constructed in O(n) time and space, where n denotes the length of w. Larsson achieved an efficient algorithm to maintain a suffix tree for a sliding window. It contributes to prediction by partial matching (PPM) style statistical data compression scheme. The compact directed acyclic word graph (CDAWG) is a more space-economical data structure for indexing a string. In this paper we propose a linear-time algorithm to maintain a CDAWG for a sliding window.

[1]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[2]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[3]  John G. Cleary,et al.  Unbounded length contexts for PPM , 1995, Proceedings DCC '95 Data Compression Conference.

[4]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[5]  Ayumi Shinohara,et al.  Construction of the CDAWG for a Trie , 2001, Stringology.

[6]  Ayumi Shinohara,et al.  On-line construction of symmetric compact directed acyclic word graphs , 2001 .

[7]  David Haussler,et al.  Complete inverted files for efficient text retrieval and analysis , 1987, JACM.

[8]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[9]  Alberto Apostolico,et al.  The Myriad Virtues of Subword Trees , 1985 .

[10]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[11]  Maxime Crochemore,et al.  On Compact Directed Acyclic Word Graphs , 1997, Structures in Logic and Computer Science.

[12]  Ayumi Shinohara,et al.  Space-Economical Construction of Index Structures for All Suffixes of a String , 2002, MFCS.

[13]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[14]  Alistair Moffat,et al.  Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..

[15]  Edward R. Fiala,et al.  Data compression with finite windows , 1989, CACM.

[16]  N. Jesper Larsson Extended application of suffix trees to data compression , 1996, Proceedings of Data Compression Conference - DCC '96.

[17]  N. Jesper Larsson Structures of String Matching and Data Compression , 1999 .