Accelerating stereo vision algorithm using SSE3, AVX2, and CUDA

Stereo vision features a widespread usage such as robotics, unmanned cars, aerial surveys, and many real-time applications. Also, it needs computational expensive calculations because of stereo matching. In real time applications, the execution time of stereo vision depth detection algorithm is very important. This paper studies the Intel SIMD instructions and CUDA effects on reducing the execution time of the stereo vision. CUDA and SIMD instructions improve performance by exploiting data level parallelism. We present a fast implementation of SSD stereo vision algorithm on Intel processors using SIMD instruction sets (SSE3 and AVX2) and NVidia Graphics Processing Unit (GPU) using CUDA language and compare their results with serial implementation. The algorithm applied to different ranges of disparity (from 16 to 256), window size (from 3×3 to 15×15) and image resolution (from 256×212 to 1408×1168) parameters. We achieved 182 frames per second rate for the disparity of 64 and window size of 3×3 in CUDA, 64 frames per second rate in AVX2 and 25 frames per second rate in SSE3. Experimental results show that we can get speedup up to 5× in SSE3, 10× in AVX2 and 21× in CUDA compared to serial implementation.

[1]  Taskin Padir,et al.  GPU-based real-time trinocular stereo vision , 2013, Electronic Imaging.

[2]  Dah-Jye Lee,et al.  Review of stereo vision algorithms and their suitability for resource-limited systems , 2013, Journal of Real-Time Image Processing.

[3]  Gabriel Taubin,et al.  Real-time stereo on GPGPU using progressive multi-resolution adaptive windows , 2011, Image Vis. Comput..

[4]  Amir Fijany,et al.  Fast implementation of dense stereo vision algorithms on a highly parallel SIMD architecture , 2011, Journal of Real-Time Image Processing.

[5]  Jie Li,et al.  Fast Narrow-Baseline Stereo Matching Using CUDA Compatible GPUs , 2015, IGTA.

[6]  Johannes Stallkamp,et al.  Real-time stereo vision: Optimizing Semi-Global Matching , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[7]  Mark L. Chang,et al.  Low-Cost Stereo Vision on an FPGA , 2007 .

[8]  Luigi di Stefano,et al.  Fast stereo matching for the VIDET system using a general purpose processor with multimedia extensions , 2000, Proceedings Fifth IEEE International Workshop on Computer Architectures for Machine Perception.

[9]  Peter Pirsch,et al.  Architectures for Stereo Vision , 2018, Handbook of Signal Processing Systems.

[10]  Eric Psota,et al.  Real-Time Stereo Matching on CUDA Using an Iterative Refinement Method for Adaptive Support-Weight Correspondences , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  J. Paul Siebert,et al.  Parallel Stereo Vision Algorithm , 2012, MARC@RWTH.

[12]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[13]  Antonio M. López,et al.  Embedded Real-time Stereo Estimation via Semi-Global Matching on the GPU , 2016, ICCS.

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  Ruigang Yang,et al.  A versatile stereo implementation on commodity graphics hardware , 2005, Real Time Imaging.

[16]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[17]  Amir Fijany,et al.  Real-Time Parallel Implementation of SSD Stereo Vision Algorithm on CSX SIMD Architecture , 2009, ISVC.

[18]  Miguel Arias-Estrada,et al.  A Fuzzy Logic Approach for Stereo Matching Suited for Real-Time Processing , 2015 .

[19]  M. Zaki,et al.  Heterogeneous Computing for Real-Time Stereo Matching , 2013 .

[20]  Stefan K. Gehrig,et al.  A Real-Time Low-Power Stereo Vision Engine Using Semi-Global Matching , 2009, ICVS.

[21]  W. van der Mark,et al.  A comparative study of fast dense stereo vision algorithms , 2004, IEEE Intelligent Vehicles Symposium, 2004.