Unsupervised feature learning for 3D scene labeling

This paper presents an approach for labeling objects in 3D scenes. We introduce HMP3D, a hierarchical sparse coding technique for learning features from 3D point cloud data. HMP3D classifiers are trained using a synthetic dataset of virtual scenes generated using CAD models from an online database. Our scene labeling system combines features learned from raw RGB-D images and 3D point clouds directly, without any hand-designed features, to assign an object label to every 3D point in the scene. Experiments on the RGB-D Scenes Dataset v.2 demonstrate that the proposed approach can be used to label indoor scenes containing both small tabletop objects and large furniture pieces.

[1]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[2]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[3]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[4]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Dieter Fox,et al.  Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.

[7]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Quoc V. Le,et al.  High-accuracy 3D sensing for mobile manipulation: Improving object detection and door opening , 2009, 2009 IEEE International Conference on Robotics and Automation.

[10]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Gregory D. Hager,et al.  Scene parsing using a prior world model , 2011, Int. J. Robotics Res..

[12]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[13]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[14]  Martin A. Riedmiller,et al.  A learned feature descriptor for object recognition in RGB-D data , 2012, 2012 IEEE International Conference on Robotics and Automation.

[15]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[16]  Thorsten Joachims,et al.  Contextually guided semantic labeling and search for three-dimensional point clouds , 2013, Int. J. Robotics Res..

[17]  Dieter Fox,et al.  Attribute based object identification , 2013, 2013 IEEE International Conference on Robotics and Automation.

[18]  Dieter Fox,et al.  Patch Volumes: Segmentation-Based Consistent Mapping with RGB-D Cameras , 2013, 2013 International Conference on 3D Vision.

[19]  Deva Ramanan,et al.  Histograms of Sparse Codes for Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[21]  Wolfram Burgard,et al.  Instace-Based AMN Classification for Improved Object Recognition in 2D and 3D Laser Range Data , 2007, IJCAI.

[22]  Heinrich H. Bülthoff,et al.  Going into depth: Evaluating 2D and 3D cues for object classification on a new, large-scale object dataset , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[23]  Dieter Fox,et al.  Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms , 2011, NIPS.

[24]  Siddhartha S. Srinivasa,et al.  Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[25]  Dieter Fox,et al.  Object Recognition in 3D Point Clouds Using Web Data and Domain Adaptation , 2010, Int. J. Robotics Res..

[26]  Martial Hebert,et al.  Efficient 3-D scene analysis from streaming data , 2013, 2013 IEEE International Conference on Robotics and Automation.

[27]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[28]  Dieter Fox,et al.  Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[29]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Hans Burkhardt,et al.  Harmonic Shape Histograms for 3D Shape Classification and Retrieval , 2007, MVA.

[31]  Martial Hebert,et al.  3-D scene analysis via sequenced predictions over points and regions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[32]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[33]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[34]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[35]  Nicholas Roy,et al.  Monte Carlo Pose Estimation with Quaternion Kernels and the Bingham Distribution , 2012 .

[36]  Peter Schröder,et al.  Proceedings of the 2003 Eurographics/ACM SIGGRAPH symposium on Geometry processing , 2003 .