Bit-Error Aware Quantization for DCT-based Lossy Compression

Scientific simulations run by high-performance computing (HPC) systems produce a large amount of data, which causes an extreme I/O bottleneck and a huge storage burden. Applying compression techniques can mitigate such overheads through reducing the data size. Unlike traditional lossless compressions, error-controlled lossy compressions, such as SZ, ZFP, and DCTZ, designed for scientists who demand not only high compression ratios but also a guarantee of certain degree of precision, is coming into prominence. While rate-distortion efficiency of recent lossy compressors, especially the DCT-based one, is promising due to its high-compression encoding, the overall coding architecture is still conservative, necessitating the quantization that strikes a balance between different encoding possibilities and varying rate-distortions. In this paper, we aim to improve the performance of DCT-based compressor, namely DCTZ, by optimizing the quantization model and encoding mechanism. Specifically, we propose a bit-efficient quantizer based on the DCTZ framework, develop a unique ordering mechanism based on the quantization table, and extend the encoding index. We evaluate the performance of our optimized DCTZ in terms of rate-distortion using real-world HPC datasets. Our experimental evaluations demonstrate that, on average, our proposed approach can improve the compression ratio of the original DCTZ by 1.38x. Moreover, combined with the extended encoding mechanism, the optimized DCTZ shows a competitive performance with state-of-the-art lossy compressors, SZ and ZFP.

[1]  Peter Lindstrom,et al.  Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[2]  Franck Cappello,et al.  Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[3]  Jeremy Kepner,et al.  Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[4]  Seung Woo Son,et al.  Lossy compression on IoT big data by exploiting spatiotemporal correlation , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[5]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[6]  Franck Cappello,et al.  Use cases of lossy compression for floating-point data in scientific data sets , 2019, Int. J. High Perform. Comput. Appl..

[7]  Franck Cappello,et al.  Full-state quantum circuit simulation by using data compression , 2019, SC.

[8]  Haiying Xu,et al.  Toward a Multi-method Approach: Lossy Data Compression for Climate Simulation Data , 2017, ISC Workshops.

[9]  Seung Woo Son,et al.  Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart , 2019, 2019 35th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Kurt B. Ferreira,et al.  On the Viability of Checkpoint Compression for Extreme Scale Fault Tolerance , 2011, Euro-Par Workshops.

[11]  Laxmikant V. Kale,et al.  Lossy Compression for Checkpointing: Fallible or Feasible? , 2014 .

[12]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.

[13]  Franck Cappello,et al.  Z-checker: A framework for assessing lossy compression of scientific data , 2017, Int. J. High Perform. Comput. Appl..

[14]  Franck Cappello,et al.  FRaZ: A Generic High-Fidelity Fixed-Ratio Lossy Compression Framework for Scientific Floating-point Data , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[15]  Franck Cappello,et al.  Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[16]  G. Meehl,et al.  OVERVIEW OF THE COUPLED MODEL INTERCOMPARISON PROJECT , 2005 .

[17]  Franck Cappello,et al.  Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data , 2020, IEEE Transactions on Parallel and Distributed Systems.

[18]  Franck Cappello,et al.  Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[19]  Wei-keng Liao,et al.  Data Compression for the Exascale Computing Era - Survey , 2014, Supercomput. Front. Innov..

[20]  Ian T. Foster Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales , 2017, HiPC.

[21]  Robert Latham,et al.  ISABELA for effective in situ compression of scientific data , 2013, Concurr. Comput. Pract. Exp..

[22]  Jae S. Lim,et al.  Algorithms for Transform Selection in Multiple-Transform Video Compression , 2012, IEEE Transactions on Image Processing.

[23]  Scott Klasky,et al.  Multilevel Techniques for Compression and Reduction of Scientific Data - The Unstructured Case , 2020, SIAM J. Sci. Comput..

[24]  Seung Woo Son,et al.  Towards Improving Rate-Distortion Performance of Transform-Based Lossy Compression for HPC Datasets , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[25]  Martin Isenburg,et al.  Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.