SAE-RNN Deep Learning for RGB-D Based Object Recognition

RGB-D image is a multimodal data. Previous works have proved that using color and depth images together can dramatically increase the RGB-D based object recognition accuracy, but most of them either simply take all modalities as input, ignoring information about specific modalities, or train a first layer representation for each modality separately and concatenate them ignoring correlated modality information. In this paper, we use a variant of the sparse auto-encoder (SAE) which can specify how mode-sparse or mode-dense the features should be. A new deep learning network combining the variant SAE with the recursive neural networks (RNNs) was proposed. Through it, we got very discriminating features and obtained state of the art performance on a standard RGB-D object dataset.

[1]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[2]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[3]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[4]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[5]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[6]  Dieter Fox,et al.  Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.

[7]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[8]  Martin A. Riedmiller,et al.  A learned feature descriptor for object recognition in RGB-D data , 2012, 2012 IEEE International Conference on Robotics and Automation.

[9]  Oussama Khatib,et al.  Experimental Robotics IV, The 4th International Symposium, Stanford, California, USA, June 30 - July 2, 1995 , 1997, ISER.

[10]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[11]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[12]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  Jürgen Schmidhuber,et al.  A committee of neural networks for traffic sign classification , 2011, The 2011 International Joint Conference on Neural Networks.