GhostSZ: A Transparent FPGA-Accelerated Lossy Compression Framework

High-performance computing (HPC) applications often generate enormous amounts of data that must be transferred for check-pointing, in situ processing, or post-execution analysis. To reduce the related network traffic and storage consumption, lossy compression schemes that target scientific data are often used. SZ compression emerged three years ago and has gained much attention because of its high compression ratio. However, performing SZ compression can take half a day per Terabyte of data; this could be a drawback to adoption. We propose GhostSZ an FPGA framework for accelerating tasks in SZ at line rate, and so transparently. The critical problem to be overcome is the tight data dependence central to SZ. GhostSZ solves this with a data transfer path having novel staged hardware. We test our implementation with both synthetic and real HPC application data and show 9.5×-80× core versus pipeline speedup over the optimized production version running on a state-of-the-art CPU and 8.2× per chip. Much of the variance in performance is due to the FPGA already running at line rate and so benefiting less from optimizations applicable to the CPU only on the most favorable data sets. The significance of this work is the possibility of a major reduction in required networking and storage in HPC installations. For example, using GhostSZ, fewer than 10 FPGAs would be sufficient to handle the entire I/O bandwidth of the top entry on the latest IO-500 list.

[1]  Thomas Ludwig,et al.  Evaluating Lossy Compression on Climate Data , 2013, ISC.

[2]  Chen Yang,et al.  Novo-G#: Large-scale reconfigurable computing with direct and programmable interconnects , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Franck Cappello,et al.  Optimization of Error-Bounded Lossy Compression for Hard-to-Compress HPC Data , 2018, IEEE Transactions on Parallel and Distributed Systems.

[4]  Franck Cappello,et al.  Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[5]  Benjamin Rose,et al.  The Fifteenth Data Release of the Sloan Digital Sky Surveys: First Release of MaNGA-derived Quantities, Data Visualization Tools, and Stellar Library , 2018, The Astrophysical Journal Supplement Series.

[6]  Benjamin Humphries,et al.  Design of 3D FFTs with FPGA clusters , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[7]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[8]  Paul Molitor,et al.  A pipelined architecture for partitioned DWT based lossy image compression using FPGA's , 2001, FPGA '01.

[9]  Franck Cappello,et al.  An Efficient Transformation Scheme for Lossy Data Compression with Point-Wise Relative Error Bound , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[10]  Nikolaos G. Bourbakis,et al.  An architecture for video compression based on the SCAN algorithm , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[11]  Chen Yang,et al.  HPC on FPGA clouds: 3D FFTs and implications for molecular dynamics , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[12]  Franck Cappello,et al.  Exploration of Pattern-Matching Techniques for Lossy Compression on Cosmology Simulation Data Sets , 2017, ISC Workshops.

[13]  José Francisco López,et al.  FPGA implementation of a lossy compression algorithm for hyperspectral images with a high-level synthesis tool , 2013, 2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013).

[14]  Chen Yang,et al.  High Performance Dynamic Communication on Reconfigurable Clusters , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[15]  Chen Yang,et al.  Collective Communication on FPGA Clusters with Static Scheduling , 2017, CARN.

[16]  Franck Cappello,et al.  Toward an Optimal Online Checkpoint Solution under a Two-Level HPC Checkpoint Model , 2017, IEEE Transactions on Parallel and Distributed Systems.

[17]  Introduction to FPGA Design with Vivado High-Level Synthesis , 2019 .

[18]  Franck Cappello,et al.  Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[19]  Mohamed S. Abdelfattah,et al.  Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL , 2014, IWOCL '14.

[20]  Hari Angepat,et al.  A cloud-scale acceleration architecture , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Peter Lindstrom,et al.  Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[22]  Robert Latham,et al.  Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data , 2011, Euro-Par.

[23]  Franck Cappello,et al.  In-depth exploration of single-snapshot lossy compression techniques for N-body simulations , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[24]  Franck Cappello,et al.  Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[25]  Scott Hauck,et al.  SPIHT image compression on FPGAs , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Yinqi Tang,et al.  Energy-Efficient Pedestrian Detection System: Exploiting Statistical Error Compensation for Lossy Memory Data Compression , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[27]  Joo-Young Kim,et al.  A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.