Efficient data storage of astronomical data using HDF5 and PEC compression

Future space missions are based on a new generation of instruments that often generate vast amounts of data. Transferring this data to ground, and once there, between different computing facilities is not an easy task whatsoever. A clear example of these missions is Gaia, a space astrometry mission of ESA. To carry out the data reduction tasks on ground, an international consortium has been set up. Among its tasks perhaps the most demanding one is the Intermediate Data Updating, which will have to repeatedly re-process nearly 100 TB of raw data received from the satellite using the latest instrument calibrations available. On the other hand, one of the best data compression solutions is the Prediction Error Coder, a highly optimized entropy coder that performs very well with data following realitic statistics. Regarding file formats, HDF5 provides a completely indexed, easily customizable file with a quick and parallel access. Moreover, HDF5 has a friendly presentation format and multi-platform compatibility. Thus, it is a powerful environment to store data compressed using the above mentioned coder. Here we show the integration of both systems for the storage of Gaia raw data. However, this integration can be applied to the efficient storage of any kind of data. Moreover, we show that the file sizes obtained using this solution are similar to those obtained using other compression algorithms that require more computing power.

[1]  Jörg Bendix,et al.  FMet - an integrated framework for Meteosat data processing for operational scientific applications , 2008, Comput. Geosci..

[2]  D. Tody,et al.  A FITS Image Compression Proposal , 2000 .

[3]  K. G. Begeman,et al.  LOFAR Information System , 2011, Future generations computer systems.

[4]  Aladdin Enterprises,et al.  ZLIB Compressed Data Format Specification version 3.3 , 1996 .

[5]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[6]  Jordi Portell,et al.  Quick outlier-resilient entropy coder for space missions , 2010 .

[7]  D. Wells,et al.  Fits: a flexible image transport system , 1981 .

[8]  Jarek Nieplocha,et al.  Evaluation of active storage strategies for the lustre parallel file system , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[9]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[10]  L. Evans The large hadron collider : a marvel of technology , 2009 .

[11]  Carine Babusiaux The Gaia instrument and basic image simulator , 2005 .

[12]  John H. Day,et al.  Implementation of CCSDS Lossless Data Compression in HDF , 2002 .

[13]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[14]  Raka Jovanovic,et al.  RDIF a preprocessing filter for HDF5 , 2010 .

[15]  M. G. Lattanzi,et al.  GAIA: Composition, formation and evolution of the Galaxy , 2001, astro-ph/0101235.

[16]  Jessica A. Faust,et al.  AVIRIS: A New Age Approach to Earth Remote Sensing , 1995 .

[17]  Jordi Portell,et al.  A resilient and quick data compression method of prediction errors for space missions , 2009, Optical Engineering + Applications.