Accelerating Cost Aggregation for Real-Time Stereo Matching

Real-time stereo matching, which is important in many applications like self-driving cars and 3-D scene reconstruction, requires large computation capability and high memory bandwidth. The most time-consuming part of stereo-matching algorithms is the aggregation of information (i.e. costs) over local image regions. In this paper, we present a generic representation and suitable implementations for three commonly used cost aggregators on many-core processors. We perform typical optimizations on the kernels, which leads to significant performance improvement (up to two orders of magnitude). Finally, we present a performance model for the three aggregators to predict the aggregation speed for a given pair of input images on a given architecture. Experimental results validate our model with an acceptable error margin (an average of 10.4%). We conclude that GPU-like many-cores are excellent platforms for accelerating stereo matching.

[1]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  John Cavazos,et al.  Optimizing and Auto-tuning Belief Propagation on the GPU , 2010, LCPC.

[3]  Brucek Khailany,et al.  CudaDMA: Optimizing GPU memory bandwidth via warp specialization , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[4]  Neil A. Dodgson,et al.  Real-Time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid , 2010, ECCV.

[5]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[6]  Franz Franchetti,et al.  High Performance Stereo Vision Designed for Massively Data Parallel Platforms , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Xing Mei,et al.  On building an accurate stereo matching system on graphics hardware , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[8]  P. Sadayappan,et al.  Optimal loop unrolling for GPGPU programs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[9]  Miao Liao,et al.  High-Quality Real-Time Stereo Using Adaptive Cost Aggregation and Dynamic Programming , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[10]  F. Barahona On the computational complexity of Ising spin glass models , 1982 .

[11]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[12]  Kuk-Jin Yoon,et al.  Locally adaptive support-weight approach for visual correspondence search , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Minh N. Do,et al.  A revisit to cost aggregation in stereo matching: How far can we reduce its computational redundancy? , 2011, 2011 International Conference on Computer Vision.

[14]  Ruigang Yang,et al.  A Performance Study on Different Cost Aggregation Approaches Used in Real-Time Stereo Matching , 2007, International Journal of Computer Vision.

[15]  Gauthier Lafruit,et al.  Cross-Based Local Stereo Matching Using Orthogonal Integral Images , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  S. Meister,et al.  An Outdoor Stereo Camera System for the Generation of Real-World Benchmark Datasets with Ground Truth , 2011 .

[18]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[19]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Jianbin Fang,et al.  A Comprehensive Performance Comparison of CUDA and OpenCL , 2011, 2011 International Conference on Parallel Processing.