An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic

To best leverage high-bandwidth storage and network technologies requires an improvement in the speed at which we can decompress data. We present a “refine and recycle” method applicable to LZ77-type decompressors that enables efficient high-bandwidth designs and present an implementation in reconfigurable logic. The method refines the write commands (for literal tokens) and read commands (for copy tokens) to a set of commands that target a single bank of block ram, and rather than performing all the dependency calculations saves logic by recycling (read) commands that return with an invalid result. A single “Snappy” decompressor implemented in reconfigurable logic leveraging this method is capable of processing multiple literal or copy tokens per cycle and achieves up to 7.2GB/s, which can keep pace with an NVMe device. The proposed method is about an order of magnitude faster and an order of magnitude more power efficient than a state-of-the-art single-core software implementation. The logic and block ram resources required by the decompressor are sufficiently low so that a set of these decompressors can be implemented on a single FPGA of reasonable size to keep up with the bandwidth provided by the most recent interface technologies.

[1]  JangHakbeom,et al.  Practical speculative parallelization of variable-length decompression algorithms , 2013 .

[2]  Jan Hidders,et al.  Work-in-Progress: A High-Bandwidth Snappy Decompressor in Reconfigurable Logic , 2018, 2018 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[3]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[4]  Joo-Young Kim,et al.  A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[5]  Jürgen Teich,et al.  Hardware Decompression Techniques for FPGA-Based Embedded Systems , 2009, TRETS.

[6]  Jason Cong,et al.  High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms , 2018, FCCM.

[7]  Xin Zhou,et al.  An Efficient Implementation of LZW Decompression in the FPGA , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[8]  Kenneth A. Ross,et al.  Massively-Parallel Lossless Data Decompression , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[9]  H. Peter Hofstee,et al.  In-memory database acceleration on FPGAs: a survey , 2019, The VLDB Journal.

[10]  Sven Ubik,et al.  LZ4 compression algorithm on FPGA , 2015, 2015 IEEE International Conference on Electronics, Circuits, and Systems (ICECS).

[11]  Jason Cong,et al.  High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[12]  H. Peter Hofstee,et al.  A high-bandwidth snappy decompressor in reconfigurable logic: work-in-progress , 2018, CODES+ISSS.

[13]  Channoh Kim,et al.  Practical speculative parallelization of variable-length decompression algorithms , 2013, LCTES '13.

[14]  J. Becker,et al.  Real-time configuration code decompression for dynamic FPGA self-reconfiguration , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[15]  Yang Qiao,et al.  An FPGA-based Snappy Decompressor-Filter , 2018 .

[16]  Wayne Luk,et al.  Lossless Compression Decoders for Bitstreams and Software Binaries Based on High-Level Synthesis , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  H. Peter Hofstee,et al.  Refine and Recycle: A Method to Increase Decompression Parallelism , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).