Robust segment-based Stereo using Cost Aggregation

Introduction Most segment based stereo methods estimate disparity by modeling color segments as 3-D planes [2]. Inherently, such methods are sensitive to segmentation parameters and intolerant to segmentation errors. Two main dependencies of these methods on the underlying segmentation algorithm are: size of segments used for estimating planes, and assignment of a single plane to the whole segment. Specifically, in the case of under-segmentation, there is a higher chance of merging multiple objects (with multiple plane surfaces) into a single segment. Consequently, planes estimated using these segments are erroneous. The effect propagates to the disparity map, wherein a larger segment encompassing multiple objects is incorrectly represented by a single disparity plane. In the over-segmentation case, which gives smaller color segments, the estimated planes may be unreliable, leading to an inaccurate disparity map. Popular segment based methods try to solve this problem by re-fitting the planes on grouped segments, in an iterative manner [2]. We propose a novel algorithm for generating sub-pixel accurate disparities on a perpixel basis, thus alleviating the problems arising from methods that estimate disparities on a per-segment basis. The proposed method computes sub-pixel precision disparity maps using the recent minimum spanning tree (MST) [4] based cost aggregation framework. Since the disparity at every pixel is modeled by a plane equation, the goal is to ensure that all pixels belonging to a planar surface are labeled with the same plane equation. We show that using a reduced and refined set of planes as candidate labels in the aggregation framework ensures homogeneous labeling within a color segment. Proposed Method Our method computes an initial set of plane equations (label set) by fitting planes inside a color segment using the consistent disparities from an initial disparity map. The initial disparity map may be generated using any local or global algorithm. These plane equations form the initial label set and a matching cost volume is computed over this set for every pixel. This cost volume is aggregated using MST based cost aggregation framework and a WTA over the aggregated cost volume gives the initial labeling. The number of labels in the initial set is of the order of the number of segments, with a plane estimate for every segment. The initial labeling is used along with the color segmentation to filter and generate a reduced set of planes. This framework of plane filtering followed by re-labeling leads to a more accurate disparity map. In addition, segment analysis is also used to modify the plane matching cost. We weigh the pixel matching cost by a support factor, where the support factor is derived from the distribution of plane labels within the color segment, as follows:

[1]  Minh N. Do,et al.  Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[3]  Ruigang Yang,et al.  Spatial-Depth Super Resolution for Range Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Pushmeet Kohli,et al.  Object stereo — Joint stereo matching and object segmentation , 2011, CVPR 2011.

[5]  Carsten Rother,et al.  PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[6]  KweonIn So,et al.  Adaptive Support-Weight Approach for Correspondence Search , 2006 .

[7]  Carsten Rother,et al.  REal-time local stereo matching using guided image filtering , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[8]  D. Nistér,et al.  Stereo Matching with Color-Weighted Correlation, Hierarchical Belief Propagation, and Occlusion Handling , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Li Hong,et al.  Segment-based stereo matching using graph cuts , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[11]  Uwe Franke,et al.  Improving sub-pixel accuracy for long range stereo , 2012, Comput. Vis. Image Underst..

[12]  Jan-Michael Frahm,et al.  Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Minglun Gong,et al.  Local stereo matching with 3D adaptive cost aggregation for slanted surface modeling and sub-pixel accuracy , 2008, 2008 19th International Conference on Pattern Recognition.

[14]  M. Fisher C and C , 2004 .

[15]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[16]  In-So Kweon,et al.  Adaptive Support-Weight Approach for Correspondence Search , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Qingxiong Yang,et al.  A non-local cost aggregation method for stereo matching , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Andreas Klaus,et al.  Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure , 2006, 18th International Conference on Pattern Recognition (ICPR'06).