Combining Semantic and Geometric Features for Object Class Segmentation of Indoor Scenes

Scene understanding is a necessary prerequisite for robots acting autonomously in complex environments. Low-cost RGB-D cameras such as Microsoft Kinect enabled new methods for analyzing indoor scenes and are now ubiquitously used in indoor robotics. We investigate strategies for efficient pixelwise object class labeling of indoor scenes that combine both pretrained semantic features transferred from a large color image dataset and geometric features, computed relative to the room structures, including a novel distance-from-wall feature, which encodes the proximity of scene points to a detected major wall of the room. We evaluate our approach on the popular NYU v2 dataset. Several deep learning models are tested, which are designed to exploit different characteristics of the data. This includes feature learning with two different pooling sizes. Our results indicate that combining semantic and geometric features yields significantly improved results for the task of object class segmentation.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Ashutosh Saxena,et al.  Hierarchical Semantic Labeling for Task-Relevant RGB-D Perception , 2014, Robotics: Science and Systems.

[4]  Sven Behnke,et al.  Depth and height aware semantic RGB-D perception with convolutional neural networks , 2015, ESANN.

[5]  Markus Vincze,et al.  Fast semantic segmentation of 3D point clouds using a dense CRF with learned parameters , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[7]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[8]  Sven Behnke,et al.  RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Carme Torras,et al.  Realtime tracking and grasping of a moving object from range video , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Sven Behnke,et al.  CURFIL: Random Forests for Image Labeling on GPU , 2015, VISAPP.

[11]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Jana Kosecka,et al.  Semantic segmentation with heterogeneous sensor coverages , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Carme Torras,et al.  Consistent Depth Video Segmentation Using Adaptive Surface Models , 2015, IEEE Transactions on Cybernetics.

[14]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[15]  Yann LeCun,et al.  Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.

[16]  Bastian Leibe,et al.  Dense 3D semantic mapping of indoor scenes from RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Martial Hebert,et al.  3-D scene analysis via sequenced predictions over points and regions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[20]  Sven Behnke,et al.  Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks , 2014, KI.

[21]  Sven Behnke,et al.  Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Mohammed Bennamoun,et al.  Geometry Driven Semantic Labeling of Indoor Scenes , 2014, ECCV.

[23]  Carme Torras,et al.  Planning robot manipulation to clean planar surfaces , 2015, Eng. Appl. Artif. Intell..

[24]  Siddhartha S. Srinivasa,et al.  Manipulation planning with goal sets using constrained trajectory optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[25]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[26]  Sven Behnke,et al.  Recurrent convolutional neural networks for object-class segmentation of RGB-D video , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[27]  Jörg Stückler,et al.  Dense real-time mapping of object-class semantics from RGB-D video , 2013, Journal of Real-Time Image Processing.

[28]  Avideh Zakhor,et al.  Sensor fusion for semantic segmentation of urban scenes , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.