Asynchronous Parallel Computing Model of Global Motion Estimation with CUDA

For video coding, weighing the balance between and coding rate image quality, we apply global motion search algorithm to avoid loss of image quality and parallel computing capacity of graphics processors to accelerate the encoding process. According to the heterogeneous system of CPU+GPU, and the multi-threaded parallel structure, thread synchronization features of CUDA platform, we build a proper global motion search on CUDA computing model; taking CUDA thread synchronization mechanism to solve the problem of data consistency and improve the efficiency of on-chip data communication; taking CUDA asynchronous mechanism to hide the delay caused by the CPU functions. Demonstrated by the experimental results, parallel computing model based on CUDA could significantly improve the efficiency of motion estimation algorithm and a certain improvement gains from the asynchronous parallel model based on CUDA asynchronous system.

[1]  Itu-T and Iso Iec Jtc Advanced video coding for generic audiovisual services , 2010 .

[2]  Ziyi Liu,et al.  Exploiting Computing Power on Graphics Processing Unit , 2008, 2008 International Conference on Computer Science and Software Engineering.

[3]  Satoshi Matsuoka,et al.  Aspects of GPU for general purpose high performance computing , 2009, 2009 Asia and South Pacific Design Automation Conference.

[4]  Li Jintao Slice-based Parallel Algorithm of H.264 Video Encoder , 2005 .

[5]  Jing Huang,et al.  Low-cost, high-speed computer vision using NVIDIA's CUDA architecture , 2008, 2008 37th IEEE Applied Imagery Pattern Recognition Workshop.