Selecting a general-purpose data compression algorithm

The National Space Science Data Center's Common Data Formate (CDF) is capable of storing many types of data such as scalar data items, vectors, and multidimensional arrays of bytes, integers, or floating point values. However, regardless of the dimensionality and data type, the data break down into a sequence of bytes that can be fed into a data compression function to reduce the amount of data without losing data integrity and thus remaining fully reconstructible. Because of the diversity of data types and high performance speed requirements, a general-purpose, fast, simple data compression algorithm is required to incorporate data compression into CDF. The questions to ask are how to evaluate and compare compression algorithms, and what compression algorithm meets all requirements. The object of this paper is to address these questions and determine the most appropriate compression algorithm to use within the CDF data management package that would be applicable to other software packages with similar data compression needs.