SLAM-OR: Simultaneous Localization, Mapping and Object Recognition Using Video Sensors Data in Open Environments from the Sparse Points Cloud

In this paper, we propose a novel approach that enables simultaneous localization, mapping (SLAM) and objects recognition using visual sensors data in open environments that is capable to work on sparse data point clouds. In the proposed algorithm the ORB-SLAM uses the current and previous monocular visual sensors video frame to determine observer position and to determine a cloud of points that represent objects in the environment, while the deep neural network uses the current frame to detect and recognize objects (OR). In the next step, the sparse point cloud returned from the SLAM algorithm is compared with the area recognized by the OR network. Because each point from the 3D map has its counterpart in the current frame, therefore the filtration of points matching the area recognized by the OR algorithm is performed. The clustering algorithm determines areas in which points are densely distributed in order to detect spatial positions of objects detected by OR. Then by using principal component analysis (PCA)—based heuristic we estimate bounding boxes of detected objects. The image processing pipeline that uses sparse point clouds generated by SLAM in order to determine positions of objects recognized by deep neural network and mentioned PCA heuristic are main novelties of our solution. In contrary to state-of-the-art approaches, our algorithm does not require any additional calculations like generation of dense point clouds for objects positioning, which highly simplifies the task. We have evaluated our research on large benchmark dataset using various state-of-the-art OR architectures (YOLO, MobileNet, RetinaNet) and clustering algorithms (DBSCAN and OPTICS) obtaining promising results. Both our source codes and evaluation data sets are available for download, so our results can be easily reproduced.

[1]  Guy Le Besnerais,et al.  OV2SLAM : A Fully Online and Versatile Visual SLAM for Real-Time Applications , 2021, ArXiv.

[2]  Hui Du,et al.  A Clustering Algorithm Based on FDP and DBSCAN , 2018, 2018 14th International Conference on Computational Intelligence and Security (CIS).

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Ali Kamandi,et al.  SW-DBSCAN: A Grid-based DBSCAN Algorithm for Large Datasets , 2020, 2020 6th International Conference on Web Research (ICWR).

[5]  Juan Song,et al.  Semantic SLAM Based on Object Detection and Improved Octomap , 2018, IEEE Access.

[6]  Xu Dongliang,et al.  SLAM Algorithm Analysis of Mobile Robot Based on Lidar , 2019, 2019 Chinese Control Conference (CCC).

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Shiyu Song,et al.  Robust Scale Estimation in Real-Time Monocular SFM for Autonomous Driving , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Wolfram Burgard,et al.  Robust visual SLAM across seasons , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Christopher Zach,et al.  Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss , 2019, ArXiv.

[11]  C. Rajasekaran,et al.  Automated Diagnosis of Cardiovascular Disease Through Measurement of Intima Media Thickness Using Deep Neural Networks , 2019, 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[12]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[13]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Sebastien Glaser,et al.  Simultaneous Localization and Mapping: A Survey of Current Trends in Autonomous Driving , 2017, IEEE Transactions on Intelligent Vehicles.

[15]  Michael Gertz,et al.  Improving the Cluster Structure Extracted from OPTICS Plots , 2018, LWDA.

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[18]  Christian Bauckhage,et al.  Online k-Maxoids Clustering , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[19]  Kris Kitani,et al.  Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[20]  Xiao Liu,et al.  DF-SLAM: A Deep-Learning Enhanced Visual SLAM System based on Deep Local Features , 2019, ArXiv.

[21]  Taskin Padir,et al.  Autonomous Robot Navigation with Rich Information Mapping in Nuclear Storage Environments , 2018, 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[22]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jingtao Zhang,et al.  Mask R-CNN Based Semantic RGB-D SLAM for Dynamic Scenes , 2019, 2019 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM).

[24]  Dong-Won Jung,et al.  Multi-channel ultrasonic sensor system for obstacle detection of the mobile robot , 2007, 2007 International Conference on Control, Automation and Systems.

[25]  Wolfgang Hess,et al.  Real-time loop closure in 2D LIDAR SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Hermann Winner,et al.  Real-Time Pose Graph SLAM based on Radar , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[27]  Joaquim Salvi,et al.  The SLAM problem: a survey , 2008, CCIA.

[28]  Dongbing Gu,et al.  Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities , 2018, Cognitive Computation.

[29]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Chen Hui,et al.  Visual SLAM based on EKF filtering algorithm from omnidirectional camera , 2013, 2013 IEEE 11th International Conference on Electronic Measurement & Instruments.

[32]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[33]  Steffen Junginger,et al.  Deep Learning for Visual SLAM in Transportation Robotics: A review , 2019, Transportation Safety and Environment.

[34]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[35]  Li-Chen Fu,et al.  Robust 2D Indoor Localization Through Laser SLAM and Visual SLAM Fusion , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[36]  Hanxu Sun,et al.  Research on bundle adjustment for visual SLAM under large-scale scene , 2017, 2017 4th International Conference on Systems and Informatics (ICSAI).

[37]  Yoonsuk Hyun,et al.  Multi-View Reprojection Architecture for Orientation Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[38]  José Ruíz Ascencio,et al.  Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[39]  Bojan Strbac,et al.  YOLO Multi-Camera Object Detection and Distance Estimation , 2020, 2020 Zooming Innovation in Consumer Technologies Conference (ZINC).

[40]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[41]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[42]  Jiwen Lu,et al.  Deep Fitting Degree Scoring Network for Monocular 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[44]  Adrien Gaidon,et al.  ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Thomas Wagner,et al.  Modification of DBSCAN and application to range/Doppler/DoA measurements for pedestrian recognition with an automotive radar system , 2015, 2015 European Radar Conference (EuRAD).

[46]  Javier Ruiz-del-Solar,et al.  Semantic Mapping of Large-Scale Outdoor Scenes for Autonomous Off-Road Driving , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[47]  Yue Pan,et al.  MULLS: Versatile LiDAR SLAM via Multi-metric Linear Least Square , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Ping Li,et al.  Study on Slam Algorithm Based on Object Detection in Dynamic Scene , 2019, 2019 International Conference on Advanced Mechatronic Systems (ICAMechS).

[49]  Levente Tamas,et al.  Visual odometer system to build feature based maps for mobile robot navigation , 2010, 18th Mediterranean Conference on Control and Automation, MED'10.

[50]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Ziad Francis,et al.  Simulation of DNA damage clustering after proton irradiation using an adapted DBSCAN algorithm , 2011, Comput. Methods Programs Biomed..

[52]  Torsten Bertram,et al.  A Fast Multi-Task CNN for Spatial Understanding of Traffic Scenes , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[53]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[54]  Min-Te Sun,et al.  A YOLO-Based Traffic Counting System , 2018, 2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI).

[55]  Xiaogang Wang,et al.  GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Chien-Yao Wang,et al.  Scaled-YOLOv4: Scaling Cross Stage Partial Network , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  M. Ankerst,et al.  OPTICS: ordering points to identify the clustering structure , 1999, ACM SIGMOD Conference.

[58]  Tomasz Hachaj Modern UVC stereovision camera’s calibration and disparity maps generation: mathematical basis, algorithms and implementations , 2020 .

[59]  Yury Vizilter,et al.  Pedestrian detection in video surveillance using fully convolutional YOLO neural network , 2017, Optical Metrology.

[60]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[61]  Hugh F. Durrant-Whyte,et al.  Simultaneous map building and localization for an autonomous mobile robot , 1991, Proceedings IROS '91:IEEE/RSJ International Workshop on Intelligent Robots and Systems '91.

[62]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[63]  Adang Suwandi Ahmad Brain inspired cognitive artificial intelligence for knowledge extraction and intelligent instrumentation system , 2017, 2017 International Symposium on Electronics and Smart Devices (ISESD).

[64]  Recognition of Cosmic Ray Images Obtained from CMOS Sensors Used in Mobile Phones by Approximation of Uncertain Class Assignment with Deep Convolutional Neural Network , 2021, Sensors (Basel, Switzerland).

[65]  Tomasz Hachaj,et al.  Image Hashtag Recommendations Using a Voting Deep Neural Network and Associative Rules Mining Approach , 2020, Entropy.

[66]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[67]  Mutlu Mete,et al.  Fast density-based lesion detection in dermoscopy images , 2011, Comput. Medical Imaging Graph..

[68]  Zhang Feizhou,et al.  Progress and Applications of Visual SLAM , 2020 .

[69]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Hriday Bavle,et al.  VPS-SLAM: Visual Planar Semantic SLAM for Aerial Robotic Systems , 2020, IEEE Access.

[71]  Daijin Kim,et al.  Fast Car/Human Classification Using Triple Directional Edge Property and Local Relations , 2009, 2009 11th IEEE International Symposium on Multimedia.

[72]  Cristina Barrado,et al.  GPS-SLAM: An Augmentation of the ORB-SLAM Algorithm , 2019, Sensors.

[73]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[75]  M. Morimoto,et al.  An object detection and extraction method using stereo camera , 2008, 2008 World Automation Congress.

[76]  Kwee-Bo Sim,et al.  SLAM of mobile robot in the indoor environment with Digital Magnetic Compass and Ultrasonic Sensors , 2007, 2007 International Conference on Control, Automation and Systems.

[77]  Sei Ikeda,et al.  Visual SLAM algorithms: a survey from 2010 to 2016 , 2017, IPSJ Transactions on Computer Vision and Applications.

[78]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[79]  Keke Zhang,et al.  YOLOv3-Lite: A Lightweight Crack Detection Network for Aircraft Structure Based on Depthwise Separable Convolutions , 2019, Applied Sciences.

[80]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Yu Zheng Chong,et al.  Sensor Technologies and Simultaneous Localization and Mapping (SLAM) , 2015 .

[82]  Xinming Huang,et al.  End-to-end learning for lane keeping of self-driving cars , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[83]  Paul L. Rosin Measuring Corner Properties , 1999, Comput. Vis. Image Underst..

[84]  John J. Leonard,et al.  Monocular SLAM Supported Object Recognition , 2015, Robotics: Science and Systems.

[85]  Intelligent Processing Technology of Cross Media Intelligence Based on Deep Cognitive Neural Network and Big Data , 2020, 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI).

[86]  Zhenzhong Chen,et al.  MonoFENet: Monocular 3D Object Detection With Feature Enhancement Networks , 2019, IEEE Transactions on Image Processing.