Optimal lossless compression of a class of dynamic sources

The usual assumption for proofs of the optimality of lossless encoding is a stationary ergodic source. Dynamic sources with non-stationary probability distributions occur in many practical situations where the data source is constructed by a composition of distinct sources, for example, a document with multiple authors, a multimedia document, or the composition of distinct packets sent over a communication channel. There is a vast literature of adaptive methods used to tailor the compression to dynamic sources. However, little is known about optimal or near optimal methods for lossless compression of strings generated by sources that are not stationary ergodic. We present a number of asymptotically efficient algorithms that address, at least from the theoretical point of view, optimal lossless compression of dynamic sources. We assume the source produces an infinite sequence of concatenated finite strings generated by sampling a stationary ergodic source.

[1]  P. Krishnan,et al.  Optimal prefetching via data compression , 1996, JACM.

[2]  Wojciech Szpankowski (Un)expected behavior of typical suffix trees , 1992, SODA '92.

[3]  Shmuel Tomi Klein,et al.  Can random fluctuation be exploited in data compression? , 1993, [Proceedings] DCC `93: Data Compression Conference.

[4]  Jack K. Wolf,et al.  New asymptotic bounds and improvements on the Lempel-Ziv data compression algorithm , 1991, IEEE Trans. Inf. Theory.

[5]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.

[6]  James A. Storer,et al.  The macro model for data compression (Extended Abstract) , 1978, STOC '78.

[7]  James A. Storer,et al.  Error-Resilient Optimal Data Compression , 1997, SIAM J. Comput..

[8]  Andrew Chi-Chih Yao,et al.  An Almost Optimal Algorithm for Unbounded Searching , 1976, Inf. Process. Lett..

[9]  Xerox Polo,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976 .

[10]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[11]  Timothy C. Bell,et al.  A Note on the DMC Data Compression Scheme , 1989, Computer/law journal.

[12]  James A. Storer,et al.  A Parallel Architecture for High-Speed Data Compression , 1991, J. Parallel Distributed Comput..

[13]  Z. Galil,et al.  Combinatorial Algorithms on Words , 1985 .

[14]  Guy Louchard,et al.  Generalized Lempel-Ziv parsing scheme and its preliminary analysis of the average profile , 1995, Proceedings DCC '95 Data Compression Conference.

[15]  J. Ziv,et al.  On the optimal asymptotic performance of universal ordering and of discrimination of individual sequences , 1992, IEEE Trans. Inf. Theory.

[16]  R. Gallager Information Theory and Reliable Communication , 1968 .

[17]  Torben Hagerup,et al.  A Guided Tour of Chernoff Bounds , 1990, Inf. Process. Lett..

[18]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[19]  Marcelo J. Weinberger,et al.  Upper bounds on the probability of sequences emitted by finite-state sources and on the redundancy of the Lempel-Ziv algorithm , 1992, IEEE Trans. Inf. Theory.

[20]  Mark N. Wegman,et al.  Variations on a theme by Ziv and Lempel , 1985 .

[21]  David M. Abrahamson An adaptive dependency source model for data compression , 1989, CACM.

[22]  Michael Rodeh,et al.  Economical encoding of commas between strings , 1978, CACM.

[23]  Aaron D. Wyner,et al.  Fixed data base version of the Lempel-Ziv data compression algorithm , 1991, [1991] Proceedings. Data Compression Conference.

[24]  Ian H. Witten,et al.  Modeling for text compression , 1989, CSUR.

[25]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[26]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[27]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[28]  Abraham Lempel,et al.  On the optimal asymptotic performance of universal ordering and discrimination of individual sequences , 1991, [1991] Proceedings. Data Compression Conference.

[29]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[30]  James A. Storer,et al.  On the design and implementation of a lossless data compression and decompression chip , 1993 .

[31]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[32]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[33]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[34]  Wojciech Szpankowski,et al.  Asymptotic properties of data compression and suffix trees , 1993, IEEE Trans. Inf. Theory.

[35]  Aaron D. Wyner,et al.  Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression , 1989, IEEE Trans. Inf. Theory.