High Performance DEFLATE Compression on Intel ® Architecture Processors
暂无分享,去创建一个
There is a critical need for lossless data compression in enterprise storage and applications such as databases and web servers, which process huge amounts of data. DEFLATE is a widely used standard to perform lossless compression, and forms the basis of utilities such as gzip and libraries such as Zlib. In these applications, compression imposes a large computational burden on the servers, and they could benefit from a highly optimized implementation. This paper describes the performance characteristics of fast prototype implementations of DEFLATE compression, on Intel ® processors based on the 32-nm micro-architecture. As the performance of compression is data dependent, we report the performance on various industry standard corpora data sets. This paper describes the performance characteristics of fast prototype implementations of DEFLATE compression. In terms of throughput, we are able to perform DEFLATE compression at the aggregate rate of ~2.7 Gigabits/sec on the Calgary Corpus data-set, on a single core of an Intel ® Core™ i5 650 processor. 1 Our fastest DEFLATE compression implementation is ~4.5 times as fast as the fastest mode of the best open source version of Zlib compression, 1 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 3 on the Intel ® Core™ i5 processor 650. To achieve such large performance gains, we sacrifice a small amount of compressibility compared to Zlib-1. 2 The Intel ® Embedded Design Center provides qualified developers with web-based access to technical resources. Access Intel Confidential design materials, step-by step guidance, application reference solutions, training, Intel " s tool loaner program, and connect with an e-help desk and the embedded community. Design Fast. Design Smart. Get started today. 2 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating …
[1] Vinodh Gopal,et al. Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction , 2010 .
[2] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.
[3] David A. Huffman,et al. A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.