CUDA 기반 실시간 움직임 추정 알고리즘 구현

This paper proposed a parallelization approach of full search algorithm for motion estimation on GPU(Graphic Processing Unit) using CUDA(Compute Unified Device Architecture). The proposed approach minimizes off-chip memory access by taking full advantage of high-speed on-chip memory of GPU. Also, it maximizes the number of active threads for parallel reduction operations depending on the dependency and the amount of data to be applied. Experimental results show that the implementation on GPU is up to 92 times faster than on CPU only and real-time performance can be achieved for 1080p videos.