Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition

This paper studies the problem of RGB-D object recognition. Inspired by the great success of deep convolutional neural networks (DCNN) in AI, researchers have tried to apply it to improve the performance of RGB-D object recognition. However, DCNN always requires a large-scale annotated dataset to supervise its training. Manually labeling such a large RGB-D dataset is expensive and time consuming, which prevents DCNN from quickly promoting this research area. To address this problem, we propose a semi-supervised multimodal deep learning framework to train DCNN effectively based on very limited labeled data and massive unlabeled data. The core of our framework is a novel diversity preserving co-training algorithm, which can successfully guide DCNN to learn from the unlabeled RGB-D data by making full use of the complementary cues of the RGB and depth data in object representation. Experiments on the benchmark RGB-D dataset demonstrate that, with only 5% labeled training data, our approach achieves competitive performance for object recognition compared with those state-of-the-art results reported by fully-supervised methods.

[1]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[2]  Xin Zhao,et al.  Query Adaptive Similarity Measure for RGB-D Object Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[4]  Kaiqi Huang,et al.  Convolutional Fisher Kernels for RGB-D Object Recognition , 2015, 2015 International Conference on 3D Vision.

[5]  Dieter Fox,et al.  Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms , 2011, NIPS.

[6]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[7]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Saturnino Maldonado-Bascón,et al.  SURFing the point clouds: Selective 3D spatial pyramids for category-level object recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Maria-Florina Balcan,et al.  Co-Training and Expansion: Towards Bridging Theory and Practice , 2004, NIPS.

[10]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[12]  D. T. Lee,et al.  Unsupervised Feature Learning for RGB-D Image Classification , 2014, ACCV.

[13]  Kaiqi Huang,et al.  Semi-supervised learning and feature evaluation for RGB-D object recognition , 2015, Comput. Vis. Image Underst..

[14]  Polina Golland,et al.  Convex Clustering with Exemplar-Based Models , 2007, NIPS.

[15]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[18]  Martin A. Riedmiller,et al.  A learned feature descriptor for object recognition in RGB-D data , 2012, 2012 IEEE International Conference on Robotics and Automation.

[19]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[20]  Jiwen Lu,et al.  MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[23]  Sven Behnke,et al.  RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Tieniu Tan,et al.  Semi-supervised Learning for RGB-D Object Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[25]  Dieter Fox,et al.  Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.