Indoor object recognition using pre-trained convolutional neural network

Indoor object recognition is a key task for mobile robot indoor navigation. In this paper, we proposed a pipeline for indoor object detection based on convolutional neural network (CNN). With the proposed method, we first pre-train an off-line CNN model by using both public Indoor Dataset and private frames of videos (FoV) dataset. This is then followed by a selective search process to extract a region of interest (RoI) after the input video was parsed into frame images. The extracted RoIs are then classified into candidates using the pre-trained deep model and the candidates between the nearest frame images are refined using detection fusion. Finally, the annotated frames are merged to create video as the output. The experiments show that our design is very efficient against indoor object detection.

[1]  Mohammed Bennamoun,et al.  RGB-D Object Recognition and Grasp Detection Using Hierarchical Cascaded Forests , 2017, IEEE Transactions on Robotics.

[2]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Sven Behnke,et al.  Combining Semantic and Geometric Features for Object Class Segmentation of Indoor Scenes , 2017, IEEE Robotics and Automation Letters.

[4]  Geoffrey A. Hollinger,et al.  HERB: a home exploring robotic butler , 2010, Auton. Robots.

[5]  Andreas Zell,et al.  Object Recognition and Tracking for Indoor Robots Using an RGB-D Sensor , 2014, IAS.

[6]  M. Welling,et al.  Region-Based Semantic Segmentation with End-to-End Training , 2016 .

[7]  Wei Ding,et al.  Development of a calibrating algorithm for Delta Robot’s visual positioning based on artificial neural network , 2016 .

[8]  Martin A. Riedmiller,et al.  A learned feature descriptor for object recognition in RGB-D data , 2012, 2012 IEEE International Conference on Robotics and Automation.

[9]  Stephen Gould,et al.  Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding , 2010, ECCV.

[10]  Ke Xie,et al.  A search-classify approach for cluttered indoor scene understanding , 2012, ACM Trans. Graph..

[11]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[12]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Christian Wolf,et al.  Multi-scale Deep Learning for Gesture Detection and Localization , 2014, ECCV Workshops.

[15]  Mohammed Bennamoun,et al.  A Discriminative Representation of Convolutional Features for Indoor Scene Recognition , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[16]  Siddhartha S. Srinivasa,et al.  Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[17]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[18]  Sven Behnke,et al.  RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Yosi Keller,et al.  Image Segmentation via Probabilistic Graph Matching , 2016, IEEE Transactions on Image Processing.

[20]  Nicholas Roy,et al.  Indoor scene recognition through object detection , 2010, 2010 IEEE International Conference on Robotics and Automation.

[21]  Francesc Moreno-Noguer,et al.  Learning RGB-D descriptors of garment parts for informed robot grasping , 2014, Eng. Appl. Artif. Intell..

[22]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[23]  Bastian Leibe,et al.  Dense 3D semantic mapping of indoor scenes from RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Hua Yang,et al.  Discriminative feature representation for image classification via multimodal multitask deep neural networks , 2017, J. Electronic Imaging.

[25]  Jae-Bok Song,et al.  Object recognition for SLAM in floor environments using a depth sensor , 2016, 2016 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI).

[26]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[27]  Daniel Fried,et al.  Bayesian geometric modeling of indoor scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Raimondo Schettini,et al.  Robust smile detection using convolutional neural networks , 2016, J. Electronic Imaging.

[29]  Hui Lin,et al.  Indoor Space Recognition using Deep Convolutional Neural Network: A Case Study at MIT Campus , 2016, ArXiv.

[30]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[31]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).