Indoor Semantic Mapping with Efficient Convolutional Neural Networks for Resource-constrained SLAM System

Scene parsing and robotic manipulation using visual sensing are of significant importance for mobile and robotics related applications. And increasing efforts are being made to associate semantic concepts to geometric entities in robot's surroundings. However, they are not well suitable for resource-constrained and mobile platforms. Toward an efficient semantic mapping solution, we propose a specifically light-weight segmentation network for building a dense 3D semantic map in real-time. The propose architecture is based on depth-wise convolution and channel shuffle to reduce computation cost. An improved Atrous Spatial Pyramid Pooling (ASPP) and encoder-decoder structures are presented and multi-scale features are merged for better prediction. Experiments are compared with other methods to prove the effectiveness and superiority.

[1]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Stefan Leutenegger,et al.  SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[4]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[5]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[6]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[7]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[9]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[10]  Jörg Stückler,et al.  Multi-view deep learning for consistent semantic mapping with RGB-D cameras , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Stefan Leutenegger,et al.  ElasticFusion: Real-time dense SLAM and light source estimation , 2016, Int. J. Robotics Res..

[12]  Steve Bourgeois,et al.  Large-scale, drift-free SLAM using highly robustified building model constraints , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  George J. Pappas,et al.  Localization from semantic observations via the matrix permanent , 2016, Int. J. Robotics Res..

[15]  Dima Damen,et al.  Detecting Carried Objects in Short Video Sequences , 2008, ECCV.

[16]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Sean L. Bowman,et al.  Probabilistic data association for semantic SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).