Semi-supervised Learning for RGB-D Object Recognition

Conventional supervised object recognition methods have been investigated for many years. Despite their successes, there are still two suffering limitations: (1) various information of an object is represented by artificial features only derived from RGB images, (2) lots of manually labeled data is required by supervised learning. To address those limitations, we propose a new semi-supervised learning framework based on RGB and depth (RGB-D) images to improve object recognition. In particular, our framework has two modules: (1) RGB and depth images are represented by convolutional-recursive neural networks to construct high level features, respectively, (2) co-training is exploited to make full use of unlabeled RGB-D instances due to the existing two independent views. Experiments on the standard RGB-D object dataset demonstrate that our method can compete against with other state-of-the-art methods with only 20% labeled data.

[1]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[2]  Martin A. Riedmiller,et al.  A learned feature descriptor for object recognition in RGB-D data , 2012, 2012 IEEE International Conference on Robotics and Automation.

[3]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[4]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[5]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[6]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[10]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[15]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[16]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[17]  John D. Lafferty,et al.  Learning image representations from the pixel level via hierarchical sparse coding , 2011, CVPR 2011.

[18]  Nikolaos Papanikolopoulos,et al.  Learning to Detect Moving Shadows in Dynamic Environments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[20]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[21]  Long Zhu,et al.  Recursive segmentation and recognition templates for image parsing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[23]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.