Lossless Compression of SKA Data Sets

With the size of astronomical data archives continuing to increase at an enormous rate, the providers and end users of astronomical data sets will benefit from effective data compression techniques. This paper explores different lossless data compression techniques and aims to find an optimal compression algorithm to compress astronomical data obtained by the Square Kilometre Array (SKA), which are new and unique in the field of radio astronomy. It was required that the compressed data sets should be lossless and that they should be compressed while the data are being read. The project was carried out in conjunction with the SKA South Africa office. Data compression reduces the time taken and the bandwidth used when transferring files, and it can also reduce the costs involved with data storage. The SKA uses the Hierarchical Data Format (HDF5) to store the data collected from the radio telescopes, with the data used in this study ranging from 29 MB to 9 GB in size. The compression techniques investigated in this study include SZIP, GZIP, the LZF filter, LZ4 and the Fully Adaptive Prediction Error Coder (FAPEC). The algorithms and methods used to perform the compression tests are discussed and the results from the three phases of testing are presented, followed by a brief discussion on those results.

[1]  Jordi Portell,et al.  Simple resiliency improvement of the CCSDS standard for lossless data compression , 2010, Optical Engineering + Applications.

[3]  M. G. Lattanzi,et al.  GAIA: Composition, formation and evolution of the Galaxy , 2001, astro-ph/0101235.

[4]  J. Lazio,et al.  The square kilometer array (SKA) radio telescope: Progress and technical directions , 2008 .

[5]  R. L. White,et al.  Lossless Astronomical Image Compression and the Effects of Noise , 2009, 0903.2140.

[6]  L. Evans The large hadron collider : a marvel of technology , 2009 .

[7]  Enrique García-Berro,et al.  Efficient data storage of astronomical data using HDF5 and PEC compression , 2011, Remote Sensing.

[8]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[9]  Mazin Abed Mohammed,et al.  Implementing a Novel Approach an Convert Audio Compression to Text Coding via Hybrid Technique , 2012 .

[10]  R. Mittra Square Kilometer Array-A unique instrument for exploring the mysteries of the universe using the Square Kilometer Array , 2009, 2009 Applied Electromagnetics Conference (AEMC).

[11]  Steven de Rooij,et al.  Approximating Rate-Distortion Graphs of Individual Data: Experiments in Lossy Compression and Denoising , 2012, IEEE Transactions on Computers.

[12]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[13]  Carlos Estepa Sánchez Feasibility study of the PEC compressor in HDFS file format , 2011 .

[14]  Jordi Portell,et al.  Quick outlier-resilient entropy coder for space missions , 2010 .

[16]  G.G. Langdon,et al.  Data compression , 1988, IEEE Potentials.

[17]  Glen G. Langdon,et al.  An Introduction to Arithmetic Coding , 1984, IBM J. Res. Dev..

[18]  K. Wagstaff,et al.  Big data challenges for large radio arrays , 2012, 2012 IEEE Aerospace Conference.

[19]  J.-L. Starck,et al.  Astronomical image and signal processing: looking at noise, information and scale , 2001, IEEE Signal Processing Magazine.

[20]  Jörg Bendix,et al.  FMet - an integrated framework for Meteosat data processing for operational scientific applications , 2008, Comput. Geosci..

[21]  Fionn Murtagh,et al.  Astronomical Image and Signal Processing , 2001 .

[22]  Jean-Mathias Griessmeier,et al.  LOFAR and HDF5: Toward a New Radio Data Standard , 2010 .

[23]  John H. Day,et al.  Implementation of CCSDS Lossless Data Compression in HDF , 2002 .