Parallel implementations of frame rate up-conversion algorithm using OpenCL on heterogeneous computing devices

As a video post-processing technology, frame rate up-conversion (FRUC) converts a low frame rate video into a higher one by inserting intermediate frames between adjacent original frames. Because computing consumption grows rapidly with the increase of video resolution and frame rate, accelerating FRUC by parallel computing may serve as an appropriate method. In this paper, an effective parallel FRUC algorithm is proposed, which consists mainly of two parts: parallel motion estimation algorithm (Three-dimensional Recursive Search algorithm, 3DRS algorithm) and parallel motion compensation algorithm. We design macro-block-level parallelism and candidate motion vector level parallelism strategies based on different granularity in the motion estimation module, and pixel-level parallelism in the motion compensation module. The proposed parallel FRUC algorithm has been tested on different hardware platforms. The results show that the method achieves significant speedups of up to 96× for 1920 × 1080 video and 254× for 3840 × 2160 video when compared with sequential implementation on CPU. Moreover, the OpenCL program of the parallel FRUC algorithm shows good portability on various GPU platforms.

[1]  Janusz Zalewski,et al.  Application of accelerated processing units in safety-critical systems , 2013 .

[2]  Yi Zhou,et al.  Dynamic strategy based parallel ant colony optimization on GPUs for TSPs , 2017, Science China Information Sciences.

[3]  Shuqian He,et al.  An efficient fast block-matching motion estimation algorithm , 2009, 2009 International Conference on Image Analysis and Signal Processing.

[4]  Frank Lindseth,et al.  Real-time gradient vector flow on GPUs using OpenCL , 2015, Journal of Real-Time Image Processing.

[5]  Diana Göhringer,et al.  A framework for accelerating local feature extraction with OpenCL on multi-core CPUs and co-processors , 2016, Journal of Real-Time Image Processing.

[6]  Fazhi He,et al.  Service-Oriented Feature-Based Data Exchange for Cloud-Based Design and Manufacturing , 2018, IEEE Transactions on Services Computing.

[7]  Munchurl Kim,et al.  No-Reference PSNR Estimation for HEVC Encoded Video , 2013, IEEE Transactions on Broadcasting.

[8]  Yi Zhou,et al.  Optimization of parallel iterated local search algorithms on graphics processing unit , 2016, The Journal of Supercomputing.

[9]  Gerard de Haan,et al.  True-motion estimation with 3-D recursive search block matching , 1993, IEEE Trans. Circuits Syst. Video Technol..

[10]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[11]  Marc Duranton,et al.  Meandering based parallel 3DRS algorithm for the multicore era , 2010, 2010 Digest of Technical Papers International Conference on Consumer Electronics (ICCE).

[12]  Xiao Chen,et al.  A parallel and robust object tracking approach synthesizing adaptive Bayesian learning and improved incremental subspace learning , 2019, Frontiers of Computer Science.

[13]  Houbing Song,et al.  Digital image watermarking method based on DCT and fractal encoding , 2017, IET Image Process..

[14]  Yong Guo,et al.  Frame Rate Up-Conversion Using Linear Quadratic Motion Estimation and Trilateral Filtering Motion Smoothing , 2016, Journal of Display Technology.

[15]  Rae-Hong Park,et al.  Weighted-adaptive motion-compensated frame rate up-conversion , 2003, IEEE Trans. Consumer Electron..

[16]  Sergio Bampi,et al.  Comparative analysis of parallel SAD calculation hardware architectures for H.264/AVC video coding , 2010, 2010 First IEEE Latin American Symposium on Circuits and Systems (LASCAS).

[17]  Fazhi He,et al.  A correlative classifiers approach based on particle filter and sample set for tracking occluded target , 2017 .

[18]  Yoonsik Choe,et al.  Frame Rate Up-Conversion Technique Using Hardware-Efficient Motion Estimator Architecture for Motion Blur Reduction of TFT-LCD , 2011, IEICE Trans. Electron..

[19]  Yiteng Pan,et al.  A novel region-based active contour model via local patch similarity measure for image segmentation , 2018, Multimedia Tools and Applications.

[20]  Fazhi He,et al.  An Efficient Particle Swarm Optimization for Large-Scale Hardware/Software Co-Design System , 2017, Int. J. Cooperative Inf. Syst..

[21]  Victor H. S. Ha,et al.  Portable receivers for digital multimedia broadcasting , 2004, IEEE Transactions on Consumer Electronics.

[22]  Chang-Su Kim,et al.  Motion-Compensated Frame Interpolation Using Bilateral Motion Estimation and Adaptive Overlapped Block Motion Compensation , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Jarmo Takala,et al.  pocl: A Performance-Portable OpenCL Implementation , 2014, International Journal of Parallel Programming.

[24]  Raymond Tay OpenCL Parallel Programming Development Cookbook , 2013 .

[25]  Manuel Prieto,et al.  Portable real-time DCT-based steganography using OpenCL , 2016, Journal of Real-Time Image Processing.

[26]  Jack J. Dongarra,et al.  From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..

[27]  Fazhi He,et al.  A Novel Hardware/Software Partitioning Method Based on Position Disturbed Particle Swarm Optimization with Invasive Weed Optimization , 2017, Journal of Computer Science and Technology.

[28]  Sergio Bampi,et al.  Parallelization of Full Search Motion Estimation Algorithm for Parallel and Distributed Platforms , 2012, International Journal of Parallel Programming.

[29]  Soonhung Han,et al.  An efficient approach to directly compute the exact Hausdorff distance for 3D point sets , 2017, Integr. Comput. Aided Eng..

[30]  Zheng Pan,et al.  A NOVEL FAST FRACTAL IMAGE COMPRESSION METHOD BASED ON DISTANCE CLUSTERING IN HIGH DIMENSIONAL SPHERE SURFACE , 2017 .

[31]  Kichul Kim,et al.  Implementation of H.264 Fractional Motion Estimation using full search algorithm , 2009, 2009 International SoC Design Conference (ISOCC).

[32]  Yi Zhou,et al.  Parallel ant colony optimization on multi-core SIMD CPUs , 2018, Future Gener. Comput. Syst..

[33]  V. S. K. Reddy,et al.  An efficient multi-layer reference frame motion estimation for video coding , 2014, Journal of Real-Time Image Processing.

[34]  Béatrice Pesquet-Popescu,et al.  OpenCL implementation of motion estimation for cloud video processing , 2011, 2011 IEEE 13th International Workshop on Multimedia Signal Processing.

[35]  Henri Calandra,et al.  Evaluation of Successive CPUs/APUs/GPUs Based on an OpenCL Finite Difference Stencil , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[36]  Kai-Kuang Ma,et al.  A new diamond search algorithm for fast block-matching motion estimation , 2000, IEEE Trans. Image Process..

[37]  Jaeseok Kim,et al.  Motion compensated frame interpolation by new block-based motion estimation algorithm , 2004, IEEE Trans. Consumer Electron..

[38]  M. Zhao,et al.  3D Recursive Search Block Matching on Graphics Processing Unit , 2008, 2008 Digest of Technical Papers - International Conference on Consumer Electronics.

[39]  Sung-Jea Ko,et al.  New frame rate up-conversion using bi-directional motion estimation , 2000, IEEE Trans. Consumer Electron..