GPU acceleration of object detection on video stream using CUDA

Object detection is one of the important applications of the computer vision, image and video processing. However the best accuracy and fast invariant detecting function under changing object states such as (position, scale, illumination and noise) is a central aspect problem of the object detection in the video frames and images that cannot be realized by using sequential processing with a single core General Purpose Central Processing Unit (GPCPU). In this paper, to solve these problems and speed up the highly intensive calculation required, a simple and an efficient template matching algorithm of object detection is proposed. It is based on using sliding window across the video frame with applying two similarity measures as a window function: Sum of Absolute Difference (SAD) and pyramid downscale multi-resolution Sum of Absolute Difference that is called (PSAD). The implementation operation is achieved by using Graphic Processing Unit (GPU) that is based on using parallel processing technique, Data Level Parallelism (DLP), and single instruction multiple data (SIMD) operations with Compute Unified Device Architecture (CUDA). For both SAD and PSAD a comparable speedup of 161x and 97x is achieved respectively for an image size of 768×567 using MATLAB environment.