Use cases of lossy compression for floating-point data in scientific data sets

Architectural and technological trends of systems used for scientific computing call for a significant reduction of scientific data sets that are composed mainly of floating-point data. This article surveys and presents experimental results of currently identified use cases of generic lossy compression to address the different limitations of scientific computing systems. The article shows from a collection of experiments run on parallel systems of a leadership facility that lossy data compression not only can reduce the footprint of scientific data sets on storage but also can reduce I/O and checkpoint/restart times, accelerate computation, and even allow significantly larger problems to be run than without lossy compression. These results suggest that lossy compression will become an important technology in many aspects of high performance scientific computing. Because the constraints for each use case are different and often conflicting, this collection of results also indicates the need for more specialization of the compression pipelines.

[1]  Peter Lindstrom,et al.  TTHRESH: Tensor Compression for Multidimensional Visual Data , 2018, IEEE Transactions on Visualization and Computer Graphics.

[2]  Robert B. Ross,et al.  Improving I/O Forwarding Throughput with Data Compression , 2011, 2011 IEEE International Conference on Cluster Computing.

[3]  Jorge Vieira,et al.  Power monitoring and control for large scale projects: SKA, a case study , 2016, Astronomical Telescopes + Instrumentation.

[4]  Bronis R. de Supinski,et al.  McrEngine: a scalable checkpointing system using data-aware aggregation and compression , 2012, HiPC 2012.

[5]  Jens Mache,et al.  Parallel I/O Performance of PC Clusters , 2001, PPSC.

[6]  Robert Latham,et al.  Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data , 2011, Euro-Par.

[7]  Paul D. Hovland,et al.  Combining checkpointing and data compression for large scale seismic inversion , 2018, ArXiv.

[8]  William Gropp,et al.  PETSc Users Manual Revision 3.4 , 2016 .

[9]  K. Perez Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment , 2014 .

[10]  Ian T. Foster Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales , 2017, HiPC.

[11]  P.Emma,et al.  High Fidelity Start-to-end Numerical Particle Simulations and Performance Studies for LCLS-II , 2015 .

[12]  Satoshi Matsuoka,et al.  Exploration of Lossy Compression for Application-Level Checkpoint/Restart , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[13]  P. Mininni,et al.  Interactive desktop analysis of high resolution simulations: application to turbulent plume dynamics and current sheet formation , 2007 .

[14]  Franck Cappello,et al.  Memory-Efficient Quantum Circuit Simulation by Using Lossy Data Compression , 2018, ArXiv.

[15]  Dieter Røhrich,et al.  Efficient TPC data compression by track and cluster modeling , 2006 .

[16]  B R de Supinski,et al.  Detailed Modeling, Design, and Evaluation of a Scalable Multi-level Checkpointing System , 2010 .

[17]  A. R. Offringa,et al.  Compression of interferometric radio-astronomical data , 2016, 1609.02019.

[18]  Sébastien Boutet,et al.  Se-SAD serial femtosecond crystallography datasets from selenobiotinyl-streptavidin , 2017, Scientific Data.

[19]  Franck Cappello,et al.  Amplitude-Aware Lossy Compression for Quantum Circuit Simulation , 2018, ArXiv.

[20]  Jill R Goldschneider,et al.  Lossy compression of scientific data via wavelets and vector quantization , 1997 .

[21]  Peter Lindstrom,et al.  Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[22]  Igor Andreoni,et al.  Enabling Near Real-Time Remote Search for Fast Transient Events with Lossy Data Compression , 2017, Publications of the Astronomical Society of Australia.

[23]  T. J. Lane,et al.  Data systems for the Linac coherent light source , 2016, Advanced Structural and Chemical Imaging.

[24]  Franck Cappello,et al.  Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[25]  Peter Fritzson,et al.  Lossless compression of high-volume numerical data from simulations , 2000, Proceedings DCC 2000. Data Compression Conference.

[26]  Franck Cappello,et al.  In-depth exploration of single-snapshot lossy compression techniques for N-body simulations , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[27]  Sébastien Boutet,et al.  Selenium single-wavelength anomalous diffraction de novo phasing using an X-ray-free electron laser , 2016, Nature Communications.

[28]  Franck Cappello,et al.  FTI: High performance Fault Tolerance Interface for hybrid systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[29]  Francesco De Simone,et al.  Evaluating lossy data compression on climate simulation data within a large ensemble , 2016, Geoscientific Model Development.

[30]  Brent Welch POSIX IO extensions for HPC , 2005 .

[31]  Hank Childs,et al.  Data Reduction Techniques for Simulation, Visualization and Data Analysis , 2018, Comput. Graph. Forum.

[32]  Mark S. Gordon,et al.  General atomic and molecular electronic structure system , 1993, J. Comput. Chem..

[33]  Hal Finkel,et al.  HACC , 2016, Commun. ACM.

[34]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[35]  Peter Deutsch,et al.  GZIP file format specification version 4.3 , 1996, RFC.

[36]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[37]  M. Kitsuregawa,et al.  The History of Storage Systems , 2012, Proceedings of the IEEE.

[38]  Martin Isenburg,et al.  Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[39]  Marco Mattavelli,et al.  Lossy compression of TPC data and trajectory tracking efficiency for the ALICE experiment , 2003 .

[40]  Franck Cappello,et al.  Exploring the feasibility of lossy compression for PDE simulations , 2019, Int. J. High Perform. Comput. Appl..

[41]  Franck Cappello,et al.  Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[42]  Franck Cappello,et al.  PaSTRI: Error-Bounded Lossy Compression for Two-Electron Integrals in Quantum Chemistry , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[43]  Franck Cappello,et al.  Improving performance of iterative methods by lossy checkponting , 2018, HPDC.

[44]  Rajeev Thakur,et al.  On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.

[45]  Haiying Xu,et al.  Toward a Multi-method Approach: Lossy Data Compression for Climate Simulation Data , 2017, ISC Workshops.

[46]  W. Collins,et al.  The Community Earth System Model: A Framework for Collaborative Research , 2013 .

[47]  Weiguo Liu,et al.  18.9-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight: Enabling Depiction of 18-Hz and 8-Meter Scenarios , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[48]  Vyacheslav V. Kitaeff,et al.  The impact of JPEG2000 lossy compression on the scientific quality of radio astronomy imagery , 2014 .

[49]  Franck Cappello,et al.  Exploration of Pattern-Matching Techniques for Lossy Compression on Cosmology Simulation Data Sets , 2017, ISC Workshops.

[50]  Scott Klasky,et al.  Multilevel techniques for compression and reduction of scientific data—the univariate case , 2018, Comput. Vis. Sci..

[51]  Wei-keng Liao,et al.  Data Compression for the Exascale Computing Era - Survey , 2014, Supercomput. Front. Innov..

[52]  John W. Young,et al.  A first order approximation to the optimum checkpoint interval , 1974, CACM.

[53]  Franck Cappello,et al.  Optimizing Lossy Compression with Adjacent Snapshots for N-body Simulation Data , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[54]  Mauricio Hanzich,et al.  Wavefield compression for adjoint methods in full-waveform inversion , 2016 .

[55]  Franck Cappello,et al.  Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[56]  Bronis R. de Supinski,et al.  Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[57]  King Ngi Ngan,et al.  Spatio-Temporal Just Noticeable Distortion Profile for Grey Scale Image/Video in DCT Domain , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[58]  Franck Cappello,et al.  Scheduling the I/O of HPC Applications Under Congestion , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[59]  Christopher J. Fluke,et al.  Data compression in the petascale astronomy era: A GERLUMPH case study , 2015, Astron. Comput..