Learning boundaries with color and depth

To enable high-level understanding of a scene, it is important to understand the occlusion and connected boundaries of objects in the image. In this paper, we propose a new framework for inferring boundaries from color and depth information. Even with depth information, it is not a trivial task to find and classify boundaries. Real-world depth images are noisy, especially at object boundaries, where our task is focused. Our approach uses features from both the color (which are sharp at object boundaries) and depth images (for providing geometric cues) to detect boundaries and classify them as occlusion or connected boundaries. We propose depth features based on surface fitting from sparse point clouds, and perform inference with a Conditional Random Field. One advantage of our approach is that occlusion and connected boundaries are identified with a single, common model. Experiments show that our mid-level color and depth features outperform using either depth or color alone, and our method surpasses the performance of baseline boundary detection methods.

[1]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[2]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[3]  Tsuhan Chen,et al.  A learning-based framework for depth ordering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Dieter Fox,et al.  Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[5]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Dieter Fox,et al.  Interactive 3D modeling of indoor environments with a consumer depth camera , 2011, UbiComp '11.

[7]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[9]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from a Single Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Alexei A. Efros,et al.  Closing the loop in scene interpretation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Shimon Ullman,et al.  Learning to Segment , 2004, ECCV.

[13]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[14]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[15]  Yun Jiang,et al.  Learning to place new objects in a scene , 2012, Int. J. Robotics Res..

[16]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Tsuhan Chen,et al.  Dense interpolation of 3D points based on surface and color , 2011, 2011 18th IEEE International Conference on Image Processing.

[18]  Mariella Dimiccoli,et al.  Exploiting T-junctions for depth segregation in single images , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Stefan Maierhofer,et al.  Consolidation of multiple depth maps , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[20]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[21]  Timo Schairer,et al.  Robust non-local denoising of colored depth data , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).