Spatial Hierarchical Analysis Deep Neural Network for RGB-D Object Recognition

Deep learning based object recognition methods have achieved unprecedented success in the recent years. However, this level of success is yet to be achieved on multimodal RGB-D images. The latter can play an important role in several computer vision and robotics applications. In this paper, we present spatial hierarchical analysis deep neural network, called ShaNet, for RGB-D object recognition. Our network consists of convolutional neural network (CNN) and recurrent neural network (RNNs) to analyse and learn distinctive and translationally invariant features in a hierarchical fashion. Unlike existing methods, which employ pre-trained models or rely on transfer learning, our proposed network is trained from scratch on RGB-D data. The proposed model has been tested on two different publicly available RGB-D datasets including Washington RGB-D and 2D3D object dataset. Our experimental results show that the proposed deep neural network achieves superior performance compared to existing RGB-D object recognition methods.

[1]  Mohammed Bennamoun,et al.  Efficient Image Set Classification Using Linear Regression Based Image Reconstruction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Anton van den Hengel,et al.  The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Mohammed Bennamoun,et al.  Machine Learning Approaches for Prediction of Facial Rejuvenation Using Real and Synthetic Data , 2019, IEEE Access.

[4]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  Mohammed Bennamoun,et al.  A Fully Automatic Framework for Prediction of 3D Facial Rejuvenation , 2018, 2018 International Conference on Image and Vision Computing New Zealand (IVCNZ).

[6]  Sven Behnke,et al.  RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Mohammed Bennamoun,et al.  2D and 3D face recognition using convolutional neural network , 2017, TENCON 2017 - 2017 IEEE Region 10 Conference.

[8]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[9]  Mohammed Bennamoun,et al.  3D-Div: A novel local surface descriptor for feature matching and pairwise range image registration , 2013, 2013 IEEE International Conference on Image Processing.

[10]  D. T. Lee,et al.  Unsupervised Feature Learning for RGB-D Image Classification , 2014, ACCV.

[11]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[12]  Heinrich H. Bülthoff,et al.  Going into depth: Evaluating 2D and 3D cues for object classification on a new, large-scale object dataset , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[13]  Mohammed Bennamoun,et al.  Efficient RGB-D object categorization using cascaded ensembles of randomized decision trees , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Rongrong Ji,et al.  Towards 3D object detection with bimodal deep Boltzmann machines over RGBD imagery , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[16]  Mohammed Bennamoun,et al.  A novel 3D vorticity based approach for automatic registration of low resolution range images , 2015, Pattern Recognit..

[17]  Ajmal S. Mian,et al.  Localized Deep Extreme Learning Machines for Efficient RGB-D Object Recognition , 2015, 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[18]  Mohammed Bennamoun,et al.  A novel feature representation for automatic 3D object recognition in cluttered scenes , 2016, Neurocomputing.

[19]  Mohammed Bennamoun,et al.  Iterative deep learning for image set based face and object recognition , 2016, Neurocomputing.

[20]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[22]  Mohammed Bennamoun,et al.  A Guide to Convolutional Neural Networks for Computer Vision , 2018, A Guide to Convolutional Neural Networks for Computer Vision.

[23]  Fuqiang Chen,et al.  Subset based deep learning for RGB-D object recognition , 2015, Neurocomputing.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[27]  Mohammed Bennamoun,et al.  Keypoints-based surface representation for 3D modeling and 3D object recognition , 2017, Pattern Recognit..

[28]  Tieniu Tan,et al.  Semi-supervised Learning for RGB-D Object Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[29]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[30]  M. Bennamoun,et al.  Automatic object detection using objectness measure , 2013, 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA).

[31]  Mohammed Bennamoun,et al.  A Novel Local Surface Description for Automatic 3D Object Recognition in Low Resolution Cluttered Scenes , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[32]  Mohammed Bennamoun,et al.  Real time surveillance for low resolution and limited data scenarios: An image set classification approach , 2018, Inf. Sci..

[33]  Mohammed Bennamoun,et al.  Evolutionary Feature Learning for 3-D Object Recognition , 2018, IEEE Access.

[34]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[35]  Mohammed Bennamoun,et al.  Performance Evaluation of 3D Local Surface Descriptors for Low and High Resolution Range Image Registration , 2014, 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[36]  Martin A. Riedmiller,et al.  A learned feature descriptor for object recognition in RGB-D data , 2012, 2012 IEEE International Conference on Robotics and Automation.

[37]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[38]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).