Toward Decoupling the Selection of Compression Algorithms from Quality Constraints

Data intense scientific domains use data compression to reduce the storage space needed. Lossless data compression preserves the original information accurately but on the domain of climate data usually yields a compression factor of only 2:1. Lossy data compression can achieve much higher compression rates depending on the tolerable error/precision needed. Therefore, the field of lossy compression is still subject to active research. From the perspective of a scientist, the compression algorithm does not matter but the qualitative information about the implied loss of precision of data is a concern.

[1]  Thomas Ludwig,et al.  Evaluating Lossy Compression on Climate Data , 2013, ISC.

[2]  Peter Lindstrom,et al.  Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[3]  Martin Isenburg,et al.  Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[4]  Jeremy Iverson,et al.  Fast and Effective Lossy Compression Algorithms for Scientific Datasets , 2012, Euro-Par.

[5]  Franck Cappello,et al.  Improving floating point compression through binary masks , 2013, 2013 IEEE International Conference on Big Data.

[6]  Julian M. Kunkel,et al.  Reducing the HPC-datastorage footprint with MAFISC—Multidimensional Adaptive Filtering Improved Scientific data Compression , 2013, Computer Science - Research and Development.

[7]  Robert Latham,et al.  Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data , 2011, Euro-Par.

[8]  Luca Bonaventura,et al.  The atmospheric general circulation model ECHAM 5. PART I: Model description , 2003 .

[9]  Gagan Agrawal,et al.  A Compression Framework for Multidimensional Scientific Datasets , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[10]  Peter Lindstrom,et al.  Assessing the effects of data compression in simulations using physically motivated metrics , 2013, SC.

[11]  Francesco De Simone,et al.  Evaluating lossy data compression on climate simulation data within a large ensemble , 2016, Geoscientific Model Development.

[12]  Julian M. Kunkel Analyzing Data Properties Using Statistical Sampling Techniques - Illustrated on Scientific File Formats and Compression Features , 2016, ISC Workshops.

[13]  Franck Cappello,et al.  Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[14]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .