MGARD+: Optimizing Multilevel Methods for Error-Bounded Scientific Data Reduction

Data management is becoming increasingly important in dealing with the large amounts of data produced by large-scale scientific simulations and instruments. Existing multilevel compression algorithms offer a promising way to manage scientific data at scale, but may suffer from relatively low performance and reduction quality. In this paper, we propose MGARD+, a multilevel data reduction and refactoring framework drawing on previous multilevel methods, to achieve high-performance data decomposition and high-quality error-bounded lossy compression. Our contributions are four-fold: 1) We propose a level-wise coefficient quantization method, which uses different error tolerances to quantize the multilevel coefficients. 2) We propose an adaptive decomposition method which treats the multilevel decomposition as a preconditioner and terminates the decomposition process at an appropriate level. 3) We leverage a set of algorithmic optimization strategies to significantly improve the performance of multilevel decomposition/recomposition. 4) We evaluate our proposed method using four real-world scientific datasets and compare with several state-of-the-art lossy compressors. Experiments demonstrate that our optimizations improve the decomposition/recomposition performance of the existing multilevel method by up to 70X, and the proposed compression method can improve compression ratio by up to 2X compared with other state-of-the-art error-bounded lossy compressors under the same level of data distortion.

[1]  Franck Cappello,et al.  Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[2]  Franck Cappello,et al.  Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[3]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[4]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[5]  Scott Klasky,et al.  Multilevel Techniques for Compression and Reduction of Scientific Data - The Multivariate Case , 2019, SIAM J. Sci. Comput..

[6]  Martin Burtscher,et al.  FPC: A High-Speed Compressor for Double-Precision Floating-Point Data , 2009, IEEE Transactions on Computers.

[7]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.

[8]  Franck Cappello,et al.  An Efficient Transformation Scheme for Lossy Data Compression with Point-Wise Relative Error Bound , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[9]  Scott Klasky,et al.  Multilevel techniques for compression and reduction of scientific data—the univariate case , 2018, Comput. Vis. Sci..

[10]  Tong Liu,et al.  Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  T. Berger Rate-Distortion Theory , 2003 .

[12]  Ying Wai Li,et al.  QMCPACK: an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.

[13]  Seung Woo Son,et al.  NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Jeremy S. Meredith,et al.  DCA++: A case for science driven application development for leadership computing platforms , 2009 .

[15]  John Clyne,et al.  VAPOR: A Visualization Package Tailored to Analyze Simulation Data in Earth System Science , 2019, Atmosphere.

[16]  Franck Cappello,et al.  Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[17]  Jarek Rossignac,et al.  Out‐of‐core compression and decompression of large n‐dimensional scalar fields , 2003, Comput. Graph. Forum.

[18]  Peter Lindstrom,et al.  Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[19]  Scott Klasky,et al.  Multilevel Techniques for Compression and Reduction of Scientific Data-Quantitative Control of Accuracy in Derived Quantities , 2019, SIAM J. Sci. Comput..

[20]  Martin Isenburg,et al.  Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[21]  Robert Latham,et al.  ISABELA for effective in situ compression of scientific data , 2013, Concurr. Comput. Pract. Exp..

[22]  Franck Cappello,et al.  Significantly improving lossy compression quality based on an optimized hybrid prediction model , 2019, SC.

[23]  Peter Deutsch,et al.  GZIP file format specification version 4.3 , 1996, RFC.