论文信息 - MGARD+: Optimizing Multilevel Methods for Error-Bounded Scientific Data Reduction

MGARD+: Optimizing Multilevel Methods for Error-Bounded Scientific Data Reduction

Data management is becoming increasingly important in dealing with the large amounts of data produced by large-scale scientific simulations and instruments. Existing multilevel compression algorithms offer a promising way to manage scientific data at scale, but may suffer from relatively low performance and reduction quality. In this paper, we propose MGARD+, a multilevel data reduction and refactoring framework drawing on previous multilevel methods, to achieve high-performance data decomposition and high-quality error-bounded lossy compression. Our contributions are four-fold: 1) We propose a level-wise coefficient quantization method, which uses different error tolerances to quantize the multilevel coefficients. 2) We propose an adaptive decomposition method which treats the multilevel decomposition as a preconditioner and terminates the decomposition process at an appropriate level. 3) We leverage a set of algorithmic optimization strategies to significantly improve the performance of multilevel decomposition/recomposition. 4) We evaluate our proposed method using four real-world scientific datasets and compare with several state-of-the-art lossy compressors. Experiments demonstrate that our optimizations improve the decomposition/recomposition performance of the existing multilevel method by up to 70X, and the proposed compression method can improve compression ratio by up to 2X compared with other state-of-the-art error-bounded lossy compressors under the same level of data distortion.

[1] Franck Cappello,et al. Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[2] Franck Cappello,et al. Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[3] Gregory K. Wallace,et al. The JPEG still picture compression standard , 1992 .

[4] Claude E. Shannon,et al. The mathematical theory of communication , 1950 .

[5] Scott Klasky,et al. Multilevel Techniques for Compression and Reduction of Scientific Data - The Multivariate Case , 2019, SIAM J. Sci. Comput..

[6] Martin Burtscher,et al. FPC: A High-Speed Compressor for Double-Precision Floating-Point Data , 2009, IEEE Transactions on Computers.

[7] Michael W. Marcellin,et al. JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.

[8] Franck Cappello,et al. An Efficient Transformation Scheme for Lossy Data Compression with Point-Wise Relative Error Bound , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[9] Scott Klasky,et al. Multilevel techniques for compression and reduction of scientific data—the univariate case , 2018, Comput. Vis. Sci..

[10] Tong Liu,et al. Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11] T. Berger. Rate-Distortion Theory , 2003 .

[12] Ying Wai Li,et al. QMCPACK: an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.

[13] Seung Woo Son,et al. NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14] Jeremy S. Meredith,et al. DCA++: A case for science driven application development for leadership computing platforms , 2009 .

[15] John Clyne,et al. VAPOR: A Visualization Package Tailored to Analyze Simulation Data in Earth System Science , 2019, Atmosphere.

[16] Franck Cappello,et al. Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[17] Jarek Rossignac,et al. Out‐of‐core compression and decompression of large n‐dimensional scalar fields , 2003, Comput. Graph. Forum.

[18] Peter Lindstrom,et al. Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[19] Scott Klasky,et al. Multilevel Techniques for Compression and Reduction of Scientific Data-Quantitative Control of Accuracy in Derived Quantities , 2019, SIAM J. Sci. Comput..

[20] Martin Isenburg,et al. Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[21] Robert Latham,et al. ISABELA for effective in situ compression of scientific data , 2013, Concurr. Comput. Pract. Exp..

[22] Franck Cappello,et al. Significantly improving lossy compression quality based on an optimized hybrid prediction model , 2019, SC.

[23] Peter Deutsch,et al. GZIP file format specification version 4.3 , 1996, RFC.