EnvSLAM: Combining SLAM Systems and Neural Networks to Improve the Environment Fusion in AR Applications

Augmented Reality (AR) has increasingly benefited from the use of Simultaneous Localization and Mapping (SLAM) systems. This technology has enabled developers to create AR markerless applications, but lack semantic understanding of their environment. The inclusion of this information would empower AR applications to better react to the surroundings more realistically. To gain semantic knowledge, in recent years, focus has shifted toward fusing SLAM systems with neural networks, giving birth to the field of Semantic SLAM. Building on existing research, this paper aimed to create a SLAM system that generates a 3D map using ORB-SLAM2 and enriches it with semantic knowledge originated from the Fast-SCNN network. The key novelty of our approach is a new method for improving the predictions of neural networks, employed to balance the loss of accuracy introduced by efficient real-time models. Exploiting sensor information provided by a smartphone, GPS coordinates are utilized to query the OpenStreetMap database. The returned information is used to understand which classes are currently absent in the environment, so that they can be removed from the network’s prediction with the goal of improving its accuracy. We achieved 87.40% Pixel Accuracy with Fast-SCNN on our custom version of COCO-Stuff and showed an improvement by involving GPS data for our self-made smartphone dataset resulting in 90.24% Pixel Accuracy. Having in mind the use on smartphones, the implementation aimed to find a trade-off between accuracy and efficiency, making the system achieve an unprecedented speed. To this end, the system was carefully designed and a strong focus on lightweight neural networks is also fundamental. This enabled the creation of an above real-time Semantic SLAM system that we called EnvSLAM (Environment SLAM). Our extensive evaluation reveals the efficiency of the system features and the operability in above real-time (48.1 frames per second with an input image resolution of 640 × 360 pixels). Moreover, the GPS integration indicates an effective improvement of the network’s prediction accuracy.

[1]  Siavash Hosseinyalamdary,et al.  Tracking 3D Moving Objects Based on GPS/IMU Navigation Solution, Laser Scanner Point Cloud and GIS Data , 2015, ISPRS Int. J. Geo Inf..

[2]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3]  Tomasz Malisiewicz,et al.  Toward Geometric Deep SLAM , 2017, ArXiv.

[4]  Hideo Saito,et al.  DetectFusion: Detecting and Segmenting Both Known and Unknown Dynamic Objects in Real-time SLAM , 2019, BMVC.

[5]  Ronald Azuma,et al.  A Survey of Augmented Reality , 1997, Presence: Teleoperators & Virtual Environments.

[6]  Mahmood Fathy,et al.  Semantic Video Segmentation: A Review on Recent Approaches , 2018, ArXiv.

[7]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[8]  Udo Frese,et al.  A SLAM Overview from a User’s Perspective , 2010, KI - Künstliche Intelligenz.

[9]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[10]  Luca Carlone,et al.  3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans , 2020, RSS 2020.

[11]  Abdellatif El Abderrahmani,et al.  From Marker to Markerless in Augmented Reality , 2020 .

[12]  Dong Seog Han,et al.  Hybrid Indoor Localization Using IMU Sensors and Smartphone Camera , 2019, Sensors.

[13]  Francisco Angel Moreno,et al.  The Málaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario , 2014, Int. J. Robotics Res..

[14]  Cristina Barrado,et al.  GPS-SLAM: An Augmentation of the ORB-SLAM Algorithm , 2019, Sensors.

[15]  Irem Ülkü,et al.  A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images , 2019, Applied Artificial Intelligence.

[16]  Xuanpeng Li,et al.  Semi-Dense 3D Semantic Mapping from Monocular SLAM , 2016, ArXiv.

[17]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[18]  Dongbing Gu,et al.  Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities , 2018, Cognitive Computation.