High Performance ZLIB Compression on Intel ® Architecture Processors

The need for lossless data compression has grown significantly as the amount of data collected, transmitted, and stored has exploded in recent years. Enterprise applications and storage, such as web servers and databases, are processing this data and the computational burden associated with compression puts a strain on resources. To help alleviate the burden, we introduce an optimized industry standard DEFLATE implementation that can be used in common libraries such as Zlib. This paper describes a high performance implementation of Zlib compression on Intel processors. Because the performance of compression implementations is data dependent, we use an industry standard data set to base our comparisons. We demonstrate substantial performance gains for all nine levels of Zlib compression, with comparable compression ratios to the baseline (except for level-1). We also introduce a new level-1 that provides significantly greater performance at the cost of some loss in compression ratio. Our high performance Zlib compression implementation is ~1.8X as fast as the latest available version of Zlib compression (1.2.8) for the default level 6 compression, on the Intel ® Core™ i7 processor 4770 processor (Haswell). The Intel ® Embedded Design Center provides qualified developers with web-based access to technical resources. Access Intel Confidential design materials, step-by step guidance, application reference solutions, training, Intel's tool loaner program, and connect with an e-help desk and the embedded community. Design Fast. Design Smart. Get started today. Overview This paper describes a fast implementation of Zlib compression on Intel processors. All nine levels of compression have been improved, with a new level-1 that has much greater performance with some loss in compression ratio. The rest of the levels do not change the compression ratio, but provide substantial performance gains. Our implementation maintains ABI compatibility with Zlib, and thus functions as a drop-in replacement.