Reducing the HPC-datastorage footprint with MAFISC—Multidimensional Adaptive Filtering Improved Scientific data Compression

Large HPC installations today also include large data storage installations. Data compression can significantly reduce the amount of data, and it was one of our goals to find out, how much compression can do for climate data. The price of compression is, of course, the need for additional computational resources, so our second goal was to relate the savings of compression to the costs it necessitates.In this paper we present the results of our analysis of typical climate data. A lossless algorithm based on these insights is developed and its compression ratio is compared to that of standard compression tools. As it turns out, this algorithm is general enough to be useful for a large class of scientific data, which is the reason we speak of MAFISC as a method for scientific data compression. A numeric problem for lossless compression of scientific data is identified and a possible solution is given. Finally, we discuss the economics of data compression in HPC environments using the example of the German Climate Computing Center.

[1]  Borko Furht A Survey of Multimedia Compression Techniques and Standards. Part I: JPEG Standard , 1995, Real Time Imaging.

[2]  Peter M. Fenwick The Burrows-Wheeler Transform for Block Sorting Text Compression: Principles and Improvements , 1996, Comput. J..

[3]  Touradj Ebrahimi,et al.  Christopoulos: Thc Jpeg2000 Still Image Coding System: an Overview the Jpeg2000 Still Image Coding System: an Overview , 2022 .

[4]  A. Hense,et al.  The Regional Climate Model COSMO-CLM (CCLM) , 2008 .

[5]  Nithin Nagaraj,et al.  Arithmetic coding as a non-linear dynamical system , 2009, 0906.3575.

[6]  Veronika Eyring,et al.  A Summary of the CMIP5 Experiment Design , 2010 .

[7]  Robert Latham The Parallel-netCDF I/O Library , 2010 .

[8]  Knut H. Alfsen,et al.  The Intergovernmental Panel on Climate Change (IPCC): Outline of an assessment , 2010 .

[9]  Sandeep Koranne Handbook of Open Source Tools , 2010 .

[10]  Kapil Jain,et al.  Performance analysis of integer wavelet transform for image compression , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[11]  Robert Latham,et al.  Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data , 2011, Euro-Par.

[12]  Alexander S. Szalay,et al.  Extreme Data-Intensive Scientific Computing , 2011, Computing in Science & Engineering.

[13]  Sandeep Koranne,et al.  Hierarchical Data Format 5 : HDF5 , 2011 .

[14]  Emmanuel Jeannot,et al.  Euro-Par 2011 Parallel Processing , 2011, Lecture Notes in Computer Science.

[15]  James P. Ahrens,et al.  Revisiting wavelet compression for large-scale climate data using JPEG 2000 and ensuring data precision , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[16]  Leonardo Chiariglione,et al.  The MPEG Representation of Digital Media , 2012 .

[17]  Marina Bosi MPEG Audio Compression Basics , 2012 .

[18]  Ankur Narang,et al.  Towards "intelligent compression" in streams: a biased reservoir sampling based Bloom filter approach , 2011, EDBT '12.