A Statistical Analysis of Compressed Climate Model Data

The data storage burden resulting from large climate model simulations continues to grow. While lossy data compression methods can alleviate this burden, they introduce the possibility that key climate variables could be altered to the point of affecting scientific conclusions. Therefore, developing a detailed understanding of how compressed model output differs from the original is important. Here, we evaluate the effects of two leading compression algorithms, SZ and ZFP, on daily surface temperature and precipitation rate data from a popular climate model. While both algorithms show promising fidelity with the original output, detectable artifacts are introduced even at relatively low error tolerances. This study highlights the need for evaluation methods that are sensitive to errors at different spatiotemporal scales and specific to the particular climate variable of interest, with the ultimate goal to improve lossy compression collaboratively with the algorithm development teams.

[1]  Peter Lindstrom,et al.  Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[2]  Charles S. Zender Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+) , 2016 .

[3]  Franck Cappello,et al.  Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[4]  K.,et al.  The Community Earth System Model (CESM) large ensemble project: a community resource for studying climate change in the presence of internal climate variability , 2015 .

[5]  Francesco De Simone,et al.  Evaluating lossy data compression on climate simulation data within a large ensemble , 2016, Geoscientific Model Development.

[6]  Julian M. Kunkel,et al.  Data Compression for Climate Data , 2016, Supercomput. Front. Innov..

[7]  Thomas Ludwig,et al.  Evaluating Lossy Compression on Climate Data , 2013, ISC.

[8]  Dorit Hammerling,et al.  Statistical Analysis of Compressed Climate Data , 2018 .

[9]  James P. Ahrens,et al.  Revisiting wavelet compression for large-scale climate data using JPEG 2000 and ensuring data precision , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[10]  W. Collins,et al.  The Community Earth System Model: A Framework for Collaborative Research , 2013 .

[11]  Haiying Xu,et al.  Toward a Multi-method Approach: Lossy Data Compression for Climate Simulation Data , 2017, ISC Workshops.

[12]  Mariana Vertenstein,et al.  A methodology for evaluating the impact of data compression on climate simulation data , 2014, HPDC '14.

[13]  Franck Cappello,et al.  Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[14]  Dorit Hammerling,et al.  Compression and Conditional Emulation of Climate Model Output , 2016, 1605.07919.