Integration of CNN into a Robotic Architecture to Build Semantic Maps of Indoor Environments

In robotics, semantic mapping refers to the construction of a rich representation of the environment that includes high level information needed by the robot to accomplish its tasks. Building a semantic map requires algorithms to process sensor data at different levels: geometric, topological and object detections/categories, which must be integrated into an unified model. This paper describes a robotic architecture that successfully builds such semantic maps for indoor environments. For this purpose, within a ROS-based ecosystem, we apply a state-of-the-art Convolutional Neural Network (CNN), concretely YOLOv3, for detecting objects in images. The detection results are placed within a geometric map of the environment making use of a number of modules of the architecture: robot localization, camera extrinsic calibration, data form a depth camera, etc. We demonstrate the suitability of the proposed framework by building semantic maps of several home environments from the Robot@Home dataset, using Unity 3D as a tool to visualize the maps as well as to provide future robotic developments.

[1]  José-Raúl Ruiz-Sarmiento,et al.  Technical improvements of the Giraff telepresence robot based on users' evaluation , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[2]  Wolfram Burgard,et al.  Conceptual spatial representations for indoor mobile robots , 2008, Robotics Auton. Syst..

[3]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[4]  Dejan Pangercic,et al.  Semantic Object Maps for robotic housework - representation, acquisition and use , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[6]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[8]  Dominique Martinez,et al.  A Model of Stimulus-Specific Neural Assemblies in the Insect Antennal Lobe , 2008, PLoS Comput. Biol..

[9]  Dieter Fox,et al.  KLD-Sampling: Adaptive Particle Filters , 2001, NIPS.

[10]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[11]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  José-Raúl Ruiz-Sarmiento,et al.  A Semantic-Based Gas Source Localization with a Mobile Robot Combining Vision and Chemical Sensing , 2018, Sensors.

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Gi Hyun Lim,et al.  Interactive Open-Ended Learning for 3D Object Recognition: An Approach and Experiments , 2015, J. Intell. Robotic Syst..

[16]  P. Mojiri Forooshani,et al.  From ROS to unity: Leveraging robot and virtual environment middleware for immersive teleoperation , 2014, 2014 IEEE International Conference on Information and Automation (ICIA).

[17]  Antonios Gasteratos,et al.  Semantic mapping for mobile robotics tasks: A survey , 2015, Robotics Auton. Syst..

[18]  José-Raúl Ruiz-Sarmiento,et al.  Scene object recognition for mobile robots through Semantic Knowledge and Probabilistic Graphical Models , 2015, Expert Syst. Appl..

[19]  Joachim Hertzberg,et al.  Context-aware 3D object anchoring for mobile robots , 2018, Robotics Auton. Syst..

[20]  José García Rodríguez,et al.  A survey on deep learning techniques for image and video semantic segmentation , 2018, Appl. Soft Comput..

[21]  Patric Jensfelt,et al.  Large-scale semantic mapping and reasoning with heterogeneous modalities , 2012, 2012 IEEE International Conference on Robotics and Automation.

[22]  José-Raúl Ruiz-Sarmiento,et al.  Ontology-based conditional random fields for object recognition , 2019, Knowl. Based Syst..

[23]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[24]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[25]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Wei Meng,et al.  ROSUnitySim: Development and experimentation of a real-time simulator for multi-unmanned aerial vehicle local planning , 2016, Simul..

[27]  José-Raúl Ruiz-Sarmiento,et al.  A survey on learning approaches for Undirected Graphical Models. Application to scene object recognition , 2017, Int. J. Approx. Reason..

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  José-Raúl Ruiz-Sarmiento,et al.  Robot@Home, a robotic dataset for semantic mapping of home environments , 2017, Int. J. Robotics Res..

[30]  José-Raúl Ruiz-Sarmiento,et al.  Building Multiversal Semantic Maps for Mobile Robot Operation , 2017, Knowl. Based Syst..

[31]  Dong Xu,et al.  Advanced Deep-Learning Techniques for Salient and Category-Specific Object Detection: A Survey , 2018, IEEE Signal Processing Magazine.

[32]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Cipriano Galindo,et al.  UPGMpp: a Software Library for Contextual Object Recognition , 2015 .

[34]  Frank Salim,et al.  The Definitive Guide to HTML5 WebSocket , 2013, Apress.