Real-time traversal in grammar-based compressed files

Summary form only given. In text compression applications, it is important to be able to process compressed data without requiring (complete) decompression. In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here the real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression. In this setting, a compressed text is represented as a small (a few Kb) dictionary D (containing a set of code words), and a very long (a few Mb) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression is comparable with standard compression methods based on the Lempel-Ziv approach. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in (L. Gasieniec et al, Proc. 13th Int. Symp. on Fund. of Comp. Theo., LNCS, vol.2138, p.138-152, 2001).