Pipelined Parallel LZSS for Streaming Data Compression on GPGPUs

In this paper, we present an algorithm and provide design improvements needed to port the serial Lempel-Ziv-Storer-Szymanski (LZSS), lossless data compression algorithm, to a parallelized version suitable for general purpose graphic processor units (GPGPU), specifically for NVIDIA's CUDA Framework. The two main stages of the algorithm, substring matching and encoding, are studied in detail to fit into the GPU architecture. We conducted detailed analysis of our performance results and compared them to serial and parallel CPU implementations of LZSS algorithm. We also benchmarked our algorithm in comparison with well known, widely used programs, GZIP and ZLIB. We achieved up to 34x better throughput than the serial CPU implementation of LZSS algorithm and up to 2.21x better than the parallelized version.

[1]  Martin Burtscher,et al.  Floating-point data compression at 75 Gb/s on a GPU , 2011, GPGPU-4.

[2]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[3]  Bingsheng He,et al.  Database compression on graphics processors , 2010, Proc. VLDB Endow..

[4]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[5]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[6]  Achim Streit,et al.  Proceedings of the 2009 international conference on Parallel processing , 2009 .

[7]  Ana Balevic Parallel Variable-Length Encoding on GPGPUs , 2009, Euro-Par Workshops.

[8]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[9]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[10]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[11]  D. Martin Swany,et al.  CULZSS: LZSS Lossless Data Compression on CUDA , 2011, 2011 IEEE International Conference on Cluster Computing.

[12]  Yao Zhang,et al.  Parallel lossless data compression on the GPU , 2012, 2012 Innovative Parallel Computing (InPar).

[13]  Anthony Skjellum,et al.  Accelerating Lossless Data Compression with GPUs , 2011, ArXiv.

[14]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[15]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.