Towards Improving Data Transfer Efficiency for Accelerators Using Hardware Compression

The overhead of moving data is the major limiting factor in todays hardware, especially in heterogeneous systems where data needs to be transferred frequently between host and accelerator memory. With the increasing availability of hardware-based compression facilities in modern computer architectures, this paper investigates the potential of hardware-accelerated I/O Link Compression as a promising approach to reduce data volumes and transfer time, thus improving the overall efficiency of accelerators in heterogeneous systems. Our considerations are focused on On-the-Fly compression in both Single-Node and Scale-Out deployments. Based on a theoretical analysis, this paper demonstrates the feasibility of hardware-accelerated On-the-Fly I/O Link Compression for many workloads in a Scale-Out scenario, and for some even in a Single-Node scenario. These findings are confirmed in a preliminary evaluation using software-and hardware-based implementations of the 842 compression algorithm.

[1]  Sergei Gorlatch,et al.  dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[2]  Vikas Udayashekar High-throughput, lossless data compression and decompression on FPGAs , 2012 .

[3]  Andreas Polze,et al.  CloudCL: Single-Paradigm Distributed Heterogeneous Computing for Cloud Infrastructures , 2018, Int. J. Netw. Comput..

[4]  Bei Hua,et al.  GLZSS: LZSS Lossless Data Compression Can Be Faster , 2014, GPGPU@ASPLOS.

[5]  Koji Nakano,et al.  Fast LZW Compression Using a GPU , 2015, 2015 Third International Symposium on Computing and Networking (CANDAR).

[6]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[7]  Koji Nakano,et al.  Fully Parallelized LZW Decompression for CUDA-Enabled GPUs , 2016, IEICE Trans. Inf. Syst..

[8]  D. Martin Swany,et al.  CULZSS: LZSS Lossless Data Compression on CUDA , 2011, 2011 IEEE International Conference on Cluster Computing.

[9]  Nam Sung Kim,et al.  Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Koji Nakano,et al.  Light Loss-Less Data Compression, with GPU Implementation , 2016, ICA3PP.

[11]  Kenneth A. Ross,et al.  Massively-Parallel Lossless Data Decompression , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[12]  Mahmut T. Kandemir,et al.  A case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling flexible data compression with assist warps , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[13]  John T. Robinson,et al.  Data compression with restricted parsings , 2006, Data Compression Conference (DCC'06).

[14]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[15]  Andreas Polze,et al.  CloudCL: Distributed Heterogeneous Computing on Cloud Scale , 2017, 2017 Fifth International Symposium on Computing and Networking (CANDAR).

[16]  Koji Nakano,et al.  Adaptive loss‐less data compression method optimized for GPU decompression , 2017, Concurr. Comput. Pract. Exp..

[17]  Jeffrey S. Vetter,et al.  A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems , 2016 .

[18]  Won Woo Ro,et al.  Warped-Compression: Enabling power efficient GPUs through register compression , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[19]  Spoorthi Suresh High-throughput, lossless data compression and decompression on FPGAs , 2012 .

[20]  Yao Zhang,et al.  Parallel lossless data compression on the GPU , 2012, 2012 Innovative Parallel Computing (InPar).