A parallel two-pass MDL context tree algorithm for universal source coding

We present a novel lossless universal source coding algorithm that uses parallel computational units to increase the throughput. The length-N input sequence is partitioned into B blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of B, but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) source underlying the entire input, and then encode each of the B blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is O(N=B). Its redundancy is approximately B log(N=B) bits above Rissanen's lower bound on universal coding performance, with respect to any tree source whose maximal depth is at most log(N=B).

[1]  Y. Shtarkov,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[2]  John T. Robinson,et al.  Parallel compression with cooperative dictionary construction , 1996, Proceedings of Data Compression Conference - DCC '96.

[3]  Yoram Bresler,et al.  Fast parallel algorithms for universal lossless source coding , 2003 .

[4]  F. Willems,et al.  A study of the context tree maximizing method , 1995 .

[5]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[6]  Yoram Bresler,et al.  An O(N) semipredictive universal encoder via the BWT , 2004, IEEE Transactions on Information Theory.

[7]  Neri Merhav,et al.  Optimal sequential probability assignment for individual sequences , 1994, IEEE Trans. Inf. Theory.

[8]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[9]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[10]  Frans M. J. Willems Some challenges in source coding , 2000 .

[11]  Faramarz Fekri,et al.  On lossless universal compression of distributed identical sources , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Yoram Bresler,et al.  Two-Part Codes with Low Worst-Case Redundancies for Distributed Compression of Bernoulli Sequences , 2003 .

[14]  Tj Tjalling Tjalkens,et al.  A parallel implementation of the CTW compression algorithm , 2001 .

[15]  Sebastian Arming,et al.  Data compression in hardware — The Burrows-Wheeler approach , 2010, 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems.