A Versatile Compression Method for Floating-Point Data Stream

With the rapid advances in supercomputing and numerical simulations, the output data of scientific computing is expanding rapidly, bringing tough challenges for data sharing and data archiving. Data compression can mitigate these challenges by reducing the size of the data to be stored or transferred. However, data compression has to achieve a good balance between compression ratios and throughput, before it can be employed in the high-end computing environments. In this paper, we propose and evaluate a versatile compression method for floating-point data. Firstly, it can achieve much better compression ratios than existing general purpose compression methods with promising throughputs. Secondly, it supports asymmetric decompression: losslessly compressed data can be decompressed lossily, thus facilitating data analysis in different precision requirements. Thirdly, it can leverage existing different kinds of general purpose compressors (zlib, lz4, for instance), and provide more flexible trade-offs between compression ratios and throughputs. Evaluations demonstrate that our compressor can achieve comparable compression ratios with the best compressors, while the compression and decompression throughputs can be 10 times higher than them. The single thread compression throughputs can be 135 MB/s, and the decompression throughputs can be 194 MB/s.

[1]  Arie Shoshani,et al.  Parallel I/O, analysis, and visualization of a trillion particle simulation , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Julian M. Kunkel,et al.  Reducing the HPC-datastorage footprint with MAFISC—Multidimensional Adaptive Filtering Improved Scientific data Compression , 2013, Computer Science - Research and Development.

[3]  Martin Burtscher,et al.  FPC: A High-Speed Compressor for Double-Precision Floating-Point Data , 2009, IEEE Transactions on Computers.

[4]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[5]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[6]  Guangwen Yang,et al.  Data Reduction Analysis for Climate Data Sets , 2013, International Journal of Parallel Programming.

[7]  Koen De Bosschere,et al.  Differential FCM: increasing value prediction accuracy by improving table usage efficiency , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[8]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[9]  J. Overpeck,et al.  Climate Data Challenges in the 21st Century , 2011, Science.

[10]  Hal Finkel,et al.  The Universe at extreme scale: Multi-petaflop sky simulation on the BG/Q , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Mike M. Chow,et al.  Optimized geometry compression for real-time rendering , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[12]  Martin Burtscher,et al.  The VPC trace-compression algorithms , 2005, IEEE Transactions on Computers.

[13]  Robert B. Ross,et al.  Byte-precision level of detail processing for variable precision analytics , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Peter Lindstrom,et al.  Assessing the effects of data compression in simulations using physically motivated metrics , 2013, SC.

[15]  Robert Latham,et al.  I/O performance challenges at leadership scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[16]  Martin Isenburg,et al.  Lossless compression of predicted floating-point geometry , 2005, Comput. Aided Des..

[17]  Kwan-Liu Ma,et al.  An Adaptive Prediction-Based Approach to Lossless Compression of Floating-Point Volume Data , 2012, IEEE Transactions on Visualization and Computer Graphics.

[18]  Dhabaleswar K. Panda,et al.  Scalable Earthquake Simulation on Petascale Supercomputers , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  John H. Day,et al.  Implementation of CCSDS Lossless Data Compression in HDF , 2002 .

[20]  Martin Isenburg,et al.  Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[21]  Timothy C. Germann,et al.  369 Tflop/s molecular dynamics simulations on the petaflop hybrid supercomputer ‘Roadrunner’ , 2009, Concurr. Comput. Pract. Exp..

[22]  Scott Klasky,et al.  Characterizing output bottlenecks in a supercomputer , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Karl E. Taylor,et al.  An overview of CMIP5 and the experiment design , 2012 .