S-preconditioner for Multi-fold Data Reduction with Guaranteed User-Controlled Accuracy

The growing gap between the massive amounts of data generated by petascale scientific simulation codes and the capability of system hardware and software to effectively analyze this data necessitates data reduction. Yet, the increasing data complexity challenges most, if not all, of the existing data compression methods. In fact, loss less compression techniques offer no more than 10% reduction on scientific data that we have experience with, which is widely regarded as effectively incompressible. To bridge this gap, in this paper, we advocate a transformative strategy that enables fast, accurate, and multi-fold reduction of double-precision floating-point scientific data. The intuition behind our method is inspired by an effective use of preconditioners for linear algebra solvers optimized for a particular class of computational "dwarfs" (e.g., dense or sparse matrices). Focusing on a commonly used multi-resolution wavelet compression technique as the underlying "solver" for data reduction we propose the S-preconditioner, which transforms scientific data into a form with high global regularity to ensure a significant decrease in the number of wavelet coefficients stored for a segment of data. Combined with the subsequent EQ-$calibrator, our resultant method (called S-Preconditioned EQ-Calibrated Wavelets (SW)), robustly achieved a 4-to 5-fold data reduction-while guaranteeing user-defined accuracy of reconstructed data to be within 1% point-by-point relative error, lower than 0.01 Normalized RMSE, and higher than 0.99 Pearson Correlation. In this paper, we show the results we obtained by testing our method on six petascale simulation codes including fusion, combustion, climate, astrophysics, and subsurface groundwater in addition to 13 publicly available scientific datasets. We also demonstrate that application-driven data mining tasks performed on decompressed variables or their derived quantities produce results of comparable quality with the ones for the original data.

[1]  Martin Isenburg,et al.  Lossless compression of predicted floating-point geometry , 2005, Comput. Aided Des..

[2]  Timothy D. Scheibe,et al.  A Component-Based Framework for Smoothed Particle Hydrodynamics Simulations of Reactive Fluid Flow in Porous Media , 2010, Int. J. High Perform. Comput. Appl..

[3]  J. Manickam,et al.  Gyro-kinetic simulation of global turbulent transport properties in tokamak experiments , 2006 .

[4]  William A. Pearlman,et al.  Three-Dimensional Wavelet-Based Compression of Hyperspectral Images , 2006, Hyperspectral Data Compression.

[5]  B. Fryxell,et al.  FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[6]  G. Robert Redinbo,et al.  Fault tolerance in computing, compressing, and transmitting FFT data , 2001, IEEE Trans. Commun..

[7]  Jill R Goldschneider,et al.  Lossy compression of scientific data via wavelets and vector quantization , 1997 .

[8]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[9]  Martin Burtscher,et al.  FPC: A High-Speed Compressor for Double-Precision Floating-Point Data , 2009, IEEE Transactions on Computers.

[10]  Martin Isenburg,et al.  Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[11]  M. K. Kwong,et al.  W-matrices, nonorthogonal multiresolution analysis, and finite signals of arbitrary length , 1994 .

[12]  Akio Arakawa,et al.  CLOUDS AND CLIMATE: A PROBLEM THAT REFUSES TO DIE. Clouds of many , 2022 .

[13]  Matthias Hollschneider More on the analysis of local regularity through wavelets , 1994 .

[14]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[15]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[16]  Patrick Marais,et al.  Quadratic vs cubic spline-wavelets for image representations and compression , 1994 .

[17]  Ahmed Zakaria,et al.  ECG Signal Compression Technique Based on Discrete Wavelet Transform and QRS-Complex Estimation , 2010 .

[18]  Wan-Chi Siu,et al.  Fast algorithm for quadratic and cubic spline wavelets , 2001, Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489).

[19]  Hugues Benoit-Cattin,et al.  Lossy compression of scientific spacecraft data using wavelets. Application to the CASSINI spacecraft data compression , 2002 .

[20]  Martin Burtscher,et al.  Fast lossless compression of scientific floating-point data , 2006, Data Compression Conference (DCC'06).

[21]  Justin K. Romberg,et al.  Wavelet-domain approximation and compression of piecewise smooth images , 2006, IEEE Transactions on Image Processing.

[22]  Scott Klasky,et al.  Plasma Edge Kinetic-MHD Modeling in Tokamaks Using Kepler Workflow for Code Coupling, Data Management and Visualization , 2008 .

[23]  Choong-Seock Chang,et al.  Full-f gyrokinetic particle simulation of centrally heated global ITG turbulence from magnetic axis to edge pedestal top in a realistic tokamak geometry , 2009 .

[24]  Subhasis Saha,et al.  Image compression—from DCT to wavelets: a review , 2000, CROS.

[25]  Raghu Machiraju,et al.  Spatial domain wavelet design for feature preservation in computational data sets , 2005, IEEE Transactions on Visualization and Computer Graphics.

[26]  Cyrus Shahabi,et al.  2D TSA-tree: a wavelet-based approach to improve the efficiency of multi-level spatial data mining , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.