Data Compression for Climate Data

The different rates of increase for computational power and storage capabilities of supercomputers turn data storage into a technical and economical problem. Because storage capabilities are lagging behind, investments and operational costs for storage systems have increased to keep up with the supercomputers' I/O requirements. One promising approach is to reduce the amount of data that is stored. In this paper, we take a look at the impact of compression on performance and costs of high performance systems. To this end, we analyze the applicability of compression on all layers of the I/O stack, that is, main memory, network and storage. Based on the Mistral system of the German Climate Computing Center Deutsches Klimarechenzentrum, DKRZ, we illustrate potential performance improvements and cost savings. Making use of compression on a large scale can decrease investments and operational costs by 50% without negatively impacting performance. Additionally, we present ongoing work for supporting enhanced adaptive compression in the parallel distributed file system Lustre and application-specific compression.

[1]  Julian M. Kunkel,et al.  Reducing the HPC-datastorage footprint with MAFISC—Multidimensional Adaptive Filtering Improved Scientific data Compression , 2013, Computer Science - Research and Development.

[2]  Ning Wang,et al.  Wavelet Compression Technique for High-Resolution Global Model Data on an Icosahedral Grid , 2015 .

[3]  Peter Lindstrom,et al.  Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[4]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[5]  Qing Yang,et al.  Compression Speed Enhancements to LZO for Multi-core Systems , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[6]  Malcolm P. Atkinson,et al.  An Adaptive, Scalable, and Portable Technique for Speeding Up MPI-Based Applications , 2012, Euro-Par.

[7]  Martin Isenburg,et al.  Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[8]  Archana Ganapathi,et al.  To compress or not to compress - compute vs. IO tradeoffs for mapreduce energy efficiency , 2010, Green Networking '10.

[9]  Erez Zadok,et al.  Energy and performance evaluation of lossless file data compression on server systems , 2009, SYSTOR '09.

[10]  Isao Kojima,et al.  Applying Selectively Parallel I/O Compression to Parallel Storage Systems , 2014, Euro-Par.

[11]  Mohamed S. Abdelfattah,et al.  Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL , 2014, IWOCL '14.

[12]  Cheng Wang,et al.  Impact of data compression on energy consumption of wireless-networked handheld devices , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[13]  Robert B. Ross,et al.  Improving I/O Forwarding Throughput with Data Compression , 2011, 2011 IEEE International Conference on Cluster Computing.

[14]  Margaret Martonosi,et al.  Data compression algorithms for energy-constrained devices in delay tolerant networks , 2006, SenSys '06.

[15]  Aasia Khanum,et al.  APCFS: Autonomous and Parallel Compressed File System , 2011, International Journal of Parallel Programming.

[16]  Jesús Carretero,et al.  Adaptive-Compi: Enhancing Mpi-Based Applications’ Performance and Scalability by using Adaptive Compression , 2011, Int. J. High Perform. Comput. Appl..

[17]  David J. Craft,et al.  A fast hardware data compression algorithm and some algorithmic extensions , 1998, IBM J. Res. Dev..

[18]  Meaza Taye Kebede Performance Comparison of Btrfs and Ext4 Filesystems , 2012 .

[19]  Peter Deutsch,et al.  DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.

[20]  Yao Zhang,et al.  Parallel lossless data compression on the GPU , 2012, 2012 Innovative Parallel Computing (InPar).

[21]  Luca Benini,et al.  Hardware-assisted data compression for energy minimization in systems with embedded processors , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[22]  Val Henson,et al.  The Zettabyte File System , 2003 .

[23]  André Brinkmann,et al.  A study on data deduplication in HPC storage systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[24]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[25]  Julian M. Kunkel,et al.  Exascale Storage Systems - An Analytical Study of Expenses , 2014, Supercomput. Front. Innov..

[26]  Robert Latham,et al.  ISABELA for effective in situ compression of scientific data , 2013, Concurr. Comput. Pract. Exp..

[27]  Martin Burtscher,et al.  Fast lossless compression of scientific floating-point data , 2006, Data Compression Conference (DCC'06).

[28]  Aleksandar Milenkovic,et al.  Energy efficiency of lossless data compression on a mobile device: An experimental evaluation , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[29]  Krste Asanovic,et al.  Energy-aware lossless data compression , 2006, TOCS.

[30]  Jesús Carretero,et al.  CoMPI: Enhancing MPI Based Applications Performance and Scalability Using Run-Time Compression , 2009, PVM/MPI.

[31]  Ross N. Williams,et al.  An extremely fast Ziv-Lempel data compression algorithm , 1991, [1991] Proceedings. Data Compression Conference.