OpenCL-based optimization methods for utilizing forward DCT and quantization of image compression on a heterogeneous platform

Recent computer systems and handheld devices are equipped with high computing capability, such as general purpose GPUs (GPGPU) and multi-core CPUs. Utilizing such resources for computation has become a general trend, making their availability an important issue for the real-time aspect. Discrete cosine transform (DCT) and quantization are two major operations in image compression standards that require complex computations. In this paper, we develop an efficient parallel implementation of the forward DCT and quantization algorithms for JPEG image compression using Open Computing Language (OpenCL). This OpenCL-based parallel implementation utilizes a multi-core CPU and a GPGPU to perform DCT and quantization computations. We demonstrate the capability of this design via two proposed working scenarios. The proposed approach also applies certain optimization techniques to improve the kernel execution time and data movements. We developed an optimal OpenCL kernel for a particular device using device-based optimization factors, such as thread granularity, work-items mapping, workload allocation, and vector-based memory access. We evaluated the performance in a heterogeneous environment, finding that the proposed parallel implementation was able to speed up the execution time of the DCT and quantization by factors of 7.97 and 8.65, respectively, obtained from 1024 × 1024 and 2084 × 2048 image sizes in 4:4:4 format.

[1]  Borko Furht A Survey of Multimedia Compression Techniques and Standards. Part I: JPEG Standard , 1995, Real Time Imaging.

[2]  Liu Duo,et al.  Parallel program design for JPEG compression encoding , 2012, 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery.

[3]  Grzegorz Pastuszak Hardware architectures for the H.265/HEVC discrete cosine transform , 2015, IET Image Process..

[4]  Saraju P. Mohanty GPU-CPU multi-core for real-time signal processing , 2009, 2009 Digest of Technical Papers International Conference on Consumer Electronics.

[5]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[6]  Colin Doutre,et al.  HEVC: The New Gold Standard for Video Compression: How Does HEVC Compare with H.264/AVC? , 2012, IEEE Consumer Electronics Magazine.

[7]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[8]  David R. Kaeli,et al.  Heterogeneous Computing with OpenCL - Revised OpenCL 1.2 Edition , 2012 .

[9]  Zhiyi Yang,et al.  Parallel Image Processing Based on CUDA , 2008, 2008 International Conference on Computer Science and Software Engineering.

[10]  Y. Arai,et al.  A Fast DCT-SQ Scheme for Images , 1988 .

[11]  K. S. Thyagarajan,et al.  Still Image And Video Compression With Matlab , 2017 .

[12]  Xun Cai,et al.  Improved HEVC lossless compression using two-stage coding with sub-frame level optimal quantization values , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[13]  Chuohao Yeo,et al.  Efficient Integer DCT Architectures for HEVC , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Sergio Bampi,et al.  Integrated Digital Architecture for JPEG Image Compression , 2001 .

[15]  Saeid Belkasim,et al.  Parallel Processing of DCT on GPU , 2011, 2011 Data Compression Conference.

[16]  Qiang Li,et al.  Implementation of the JPEG on DSP Processors , 2010 .

[17]  Weidong Kou,et al.  Digital Image Compression , 1995 .

[18]  Amnon Barak,et al.  A package for OpenCL based heterogeneous computing on clusters with many GPU devices , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).

[19]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[20]  Matthew Goldman,et al.  High-Efficiency Video Coding (HEVC): The Next-Generation Compression Technology , 2011 .

[21]  David Kaeli,et al.  Heterogeneous Computing with OpenCL , 2011 .

[22]  Yun Q. Shi,et al.  Image and Video Compression for Multimedia Engineering , 1999 .

[23]  Weidong Kou Digital Image Compression: Algorithms and Standards , 2010 .

[24]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[25]  Sebastian Hack,et al.  Improving Performance of OpenCL on CPUs , 2012, CC.

[26]  Joan L. Mitchell,et al.  JPEG: Still Image Data Compression Standard , 1992 .

[27]  Raj Talluri,et al.  Programmable DSP platform for digital still cameras , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).