New interfaces to interconnect CPUs and accelerators at memory-class bandwidth pose new opportunities and challenges for the design of accelerators. This thesis studies one such accelerator, a decompressor for Parquet files compressed with the Snappy library. Our design targets reconfigurable logic (FPGAs) attached via the open coherent accelerator processor interface(OpenCAPI) at 25.6GB/s. We give an overview of the previous research in hardware-based (de)compression engines and present and analyze our design. Much of the challenge of designing the decompression engine stems from the need to process more than one token per cycle. In our design, a single engine can process two tokens per cycle. A Xilinx KU15P FPGA is expected to support multiple such engines. The input throughput and the output throughput ranges of a single engine are 3.9$\sim$6.3 bytes/cycle and 8.3$\sim$15 bytes/cycle, respectively. Based on the implementation results, a single engine of the proposed design could work at 140MHz, meaning 0.51$\sim$0.82 GB/s input throughput or 1.08$\sim$1.96 GB/s output throughput. The Parquet format enables the parallel decompression of multiple blocks when multiple units are instantiated. With the latest generation of FPGAs, we estimate at most 28 units can be supported leading to a total input/output bandwidth of 14.28/30.24 to 22.96/54.88 GB/s. Because the output bandwidth can exceed the interface bandwidth if multiple engines are supported, the design is especially effective when combined with a filter engine that reduces the output size.
[1]
David A. Huffman,et al.
A method for the construction of minimum-redundancy codes
,
1952,
Proceedings of the IRE.
[2]
Abraham Lempel,et al.
A universal algorithm for sequential data compression
,
1977,
IEEE Trans. Inf. Theory.
[3]
Martha A. Kim,et al.
Hardware Acceleration
,
2018,
IEEE Micro.
[4]
J. Sobolewski.
Cyclic redundancy check
,
2003
.
[5]
Peter Deutsch,et al.
GZIP file format specification version 4.3
,
1996,
RFC.
[6]
Y.T.B. Mulder,et al.
Feeding High-Bandwidth Streaming-Based FPGA Accelerators
,
2018
.
[7]
Peter Deutsch,et al.
DEFLATE Compressed Data Format Specification version 1.3
,
1996,
RFC.
[8]
Kiyoung Choi,et al.
ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator
,
2017,
Proc. VLDB Endow..
[9]
H. Peter Hofstee,et al.
Adopting OpenCAPI for High Bandwidth Database Accelerators
,
2017
.
[10]
GPU vs FPGA Performance Comparison White Paper 2
,
2016
.
[11]
Kenneth A. Ross,et al.
Massively-Parallel Lossless Data Decompression
,
2016,
2016 45th International Conference on Parallel Processing (ICPP).
[12]
Nikolay Gavrilov,et al.
Разработка высокопроизводительного метода исследования морфологии биологических объектов с реализацией на GPU
,
2015
.