Fast implementation of dense stereo vision algorithms on a highly parallel SIMD architecture

In this paper, we present faster than real-time implementation of a class of dense stereo vision algorithms on a low-power massively parallel SIMD architecture, the CSX700. With two cores, each with 96 Processing Elements, this SIMD architecture provides a peak computation power of 96 GFLOPS while consuming only 9 Watts, making it an excellent candidate for embedded computing applications. Exploiting full features of this architecture, we have developed schemes for an efficient parallel implementation with minimum of overhead. For the sum of squared differences (SSD) algorithm and for VGA (640 × 480) images with disparity ranges of 16 and 32, we achieve a performance of 179 and 94 frames per second (fps), respectively. For the HDTV (1,280 × 720) images with disparity ranges of 16 and 32, we achieve a performance of 67 and 35 fps, respectively. We have also implemented more accurate, and hence more computationally expensive variants of the SSD, and for most cases, particularly for VGA images, we have achieved faster than real-time performance. Our results clearly demonstrate that, by developing careful parallelization schemes, the CSX architecture can provide excellent performance and flexibility for various embedded vision applications.

[1]  C. Soviany,et al.  Embedding data and task parallelism in image processing applications , 2003 .

[2]  Jonathan M. Garibaldi,et al.  Real-Time Correlation-Based Stereo Vision with Reduced Border Errors , 2002, International Journal of Computer Vision.

[3]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[4]  M. Marchionni,et al.  A PC-based real-time stereo vision system , 2004 .

[5]  Dariu Gavrila,et al.  Real-time dense stereo for intelligent vehicles , 2006, IEEE Transactions on Intelligent Transportation Systems.

[6]  Amir Fijany,et al.  Highly Parallel Implementation of Harris Corner Detector on CSX SIMD Architecture , 2010, Euro-Par Workshops.

[7]  Ke Zhu,et al.  Comparison of Dense Stereo Using CUDA , 2010, ECCV Workshops.

[8]  V. Heuveline,et al.  Lattice Boltzmann methods on the ClearSpeed Advance™ accelerator board , 2009 .

[9]  Andreas Steininger,et al.  Hardware implementation of an SAD based stereo vision algorithm , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Amir Fijany,et al.  Real-Time Parallel Implementation of SSD Stereo Vision Algorithm on CSX SIMD Architecture , 2009, ISVC.

[11]  Amir Fijany,et al.  Image processing applications on a low power highly parallel SIMD architecture , 2011, 2011 Aerospace Conference.

[12]  Barry McCullagh Real-time disparity map computation using the cell broadband engine , 2010, Journal of Real-Time Image Processing.

[13]  Ioannis Andreadis,et al.  A real-time fuzzy hardware structure for disparity map computation , 2011, Journal of Real-Time Image Processing.

[14]  Ruigang Yang,et al.  A versatile stereo implementation on commodity graphics hardware , 2005, Real Time Imaging.

[15]  Tian-Sheuan Chang,et al.  Real-Time DSP Implementation on Local Stereo Matching , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[16]  Yunde Jia,et al.  A miniature stereo vision machine (MSVM-III) for dense disparity mapping , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[17]  Scott A. Mahlke,et al.  Mobile Supercomputers for the Next-Generation Cell Phone , 2010, Computer.

[18]  N. Felber,et al.  Efficient ASIC implementation of a real-time depth mapping stereo vision system , 2003, 2003 46th Midwest Symposium on Circuits and Systems.

[19]  John Iselin Woodfill,et al.  Tyzx DeepSea High Speed Stereo Vision System , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.