Optimizing Lossy Compression with Adjacent Snapshots for N-body Simulation Data

Today’s N-body simulations are producing extremely large amounts of data. The Hardware/Hybrid Accelerated Cosmology Code (HACC), for example, may simulate trillions of particles, producing tens of petabytes of data to store in a parallel file system, according to the HACC users. In this paper, we design and implement an efficient, in situ error-bounded lossy compressor to significantly reduce the data size for N-body simulations. Not only can our compressor save significant storage space for N-body simulation researchers, but it can also improve the I/O performance considerably with limited memory and computation overhead. Our contribution is threefold. (1) We propose an efficient data compression model by leveraging the consecutiveness of the cosmological data in both space and time dimensions as well as the physical correlation across different fields. (2) We propose a lightweight, efficient alignment mechanism to align the disordered particles across adjacent snapshots in the simulation, which is a fundamental step in the whole compression procedure. We also optimize the compression quality by exploring best-fit data prediction strategies and optimizing the frequencies of the space-based compression vs. time-based compression. (3) We evaluate our compressor using both a cosmological simulation package and molecular dynamics simulation data—two major categories in the N-body simulation domain. Experiments show that under the same distortion of data, our solution produces up to 43% higher compression ratios on the velocity field and up to 300% higher on the position field than do other state-of-the-art compressors (including SZ, ZFP, NUMARCK, and decimation). With our compressor, the overall I/O time on HACC data is reduced by up to 20% compared with the second-best compressor.

[1]  Franck Cappello,et al.  An Efficient Transformation Scheme for Lossy Data Compression with Point-Wise Relative Error Bound , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[2]  Bryan Usevitch,et al.  JPEG2000 Compatible Lossless Coding of Floating-Point Data , 2007, EURASIP J. Image Video Process..

[3]  Hal Finkel,et al.  HACC: Simulating Sky Surveys on State-of-the-Art Supercomputing Architectures , 2014, 1410.2805.

[4]  Franck Cappello,et al.  Optimization of Error-Bounded Lossy Compression for Hard-to-Compress HPC Data , 2018, IEEE Transactions on Parallel and Distributed Systems.

[5]  Franck Cappello,et al.  Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[6]  Seung Woo Son,et al.  Parallel Implementation of Lossy Data Compression for Temporal Data Sets , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).

[7]  Erik Lindahl,et al.  An efficient and extensible format, library, and API for binary trajectory data from molecular simulations , 2014, J. Comput. Chem..

[8]  Martin Burtscher,et al.  Fast lossless compression of scientific floating-point data , 2006, Data Compression Conference (DCC'06).

[9]  Brent Welch POSIX IO extensions for HPC , 2005 .

[10]  Martin Burtscher,et al.  High Throughput Compression of Double-Precision Floating-Point Data , 2007, 2007 Data Compression Conference (DCC'07).

[11]  Franck Cappello,et al.  Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[12]  Peter Lindstrom,et al.  Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[13]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[14]  Seung Woo Son,et al.  NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Martin Isenburg,et al.  Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[16]  Daniel S. D. Larsson,et al.  Trajectory NG: portable, compressed, general molecular dynamics trajectories , 2011, Journal of molecular modeling.

[17]  Robert Latham,et al.  Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data , 2011, Euro-Par.

[18]  Anand Kumar,et al.  Compression in Molecular Simulation Datasets , 2013, IScIDE.

[19]  Rajiv K. Kalia,et al.  Scalable I/O of large-scale molecular dynamics simulations: A data-compression algorithm , 2000 .

[20]  Danny Perez,et al.  The mobility of small vacancy/helium complexes in tungsten and its impact on retention in fusion-relevant conditions , 2017, Scientific Reports.

[21]  J. S. Bagla Cosmological N-body simulation : Techniques, scope and status , 2004 .

[22]  Ananth Grama,et al.  Bounded-Error Compression of Particle Data from Hierarchical Approximate Methods , 1999, SC.

[23]  Franck Cappello,et al.  In-depth exploration of single-snapshot lossy compression techniques for N-body simulations , 2017, 2017 IEEE International Conference on Big Data (Big Data).