A fast feature extraction in object recognition using parallel processing on CPU and GPU

Due to the advents of multi-core CPU and GPU, various parallel processing techniques have been widely applied to many application fields including computer vision. This paper presents a parallel processing technique for realtime feature extraction in object recognition by autonomous mobile robots, which utilizes both CPU and GPU by combining OpenMP, SSE (Streaming SIMD Extension) and CUDA programming. Firstly, the algorithms and codes for feature extraction are optimized and implemented in parallel processing. After the parallel algorithms are assured to maintain the same level of performance, the process for extracting key points and obtaining dominant orientation with respect to the key points is parallelized. Following the extraction is the construction of a parallel descriptor via SSE instructions. Finally, the GPU version of SIFT is also implemented using CUDA. The experiments have shown that the CPU version of SIFT is almost five times faster than the original SIFT while maintaining robust performance. Further, the GPU-Parallel descriptor achieves acceleration up to five times higher than the CPU-Parallel descriptor at a cost of a bit lower performance.

[1]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  S. Ullman High-Level Vision: Object Recognition and Visual Cognition , 1996 .

[4]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[5]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[6]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  E. Rolls High-level vision: Object recognition and visual cognition, Shimon Ullman. MIT Press, Bradford (1996), ISBN 0 262 21013 4 , 1997 .

[8]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[9]  Shi-Jinn Horng,et al.  Run-length chain coding and scalable computation of a shape's moments using reconfigurable optical buses , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Stamatis Vassiliadis,et al.  Performance comparison of SIMD implementations of the discrete wavelet transform , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[11]  Cristina Nicolescu,et al.  A data and task parallel image processing environment , 2002, Parallel Comput..

[12]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[13]  Stefano Cagnoni,et al.  Evolving binary classifiers through parallel computation of multiple fitness cases , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Jan-Michael Frahm,et al.  Feature tracking and matching in video using programmable graphics hardware , 2007, Machine Vision and Applications.

[15]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .