S3G-ARM: Highly Compressive Visual Self-localization from Sequential Semantic Scene Graph Using Absolute and Relative Measurements

In this paper, we address the problem of image sequence-based self-localization (ISS) from a new highly compressive scene representation called sequential semantic scene graph (S3G). Recent developments in deep graph convolutional neural networks (GCNs) have enabled a highly compressive visual place classifier (VPC) that can use a scene graph as the input modality. However, in such a highly compressive application, the amount of information lost in the image-tograph mapping is significant and can damage the classification performance. To address this issue, we propose a pair of similarity-preserving mappings, image-to-nodes and image-toedges, such that the nodes and edges act as absolute and relative features, respectively, that complement each other. Moreover, the proposed GCN-VPC is applied to a new task of viewpoint planning (VP) of the query image sequence, which contributes to further improvement in the VPC performance. Experiments using the public NCLT dataset validated the effectiveness of the proposed method.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Wolfram Burgard,et al.  Monte Carlo localization for mobile robots , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[3]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[4]  Haidong Zhu,et al.  Biologically-Constrained Graphs for Global Connectomics Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yang Song,et al.  UWB/LiDAR Fusion For Cooperative Range-Only SLAM , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[6]  Barbara Caputo,et al.  Frustratingly Easy NBNN Domain Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[9]  Andrew W. Cross,et al.  Demonstration of a quantum error detection code using a square lattice of four superconducting qubits , 2015, Nature Communications.

[10]  Byungjae Park,et al.  1-Day Learning, 1-Year Localization: Long-Term LiDAR Localization Using Scan Context Image , 2019, IEEE Robotics and Automation Letters.

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[13]  Maosheng Ye,et al.  GOSMatch: Graph-of-Semantics Matching for Detecting Loop Closures in 3D LiDAR data , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Tor Arne Johansen,et al.  Redesign and analysis of globally asymptotically stable bearing only SLAM , 2017, 2017 20th International Conference on Information Fusion (Fusion).

[15]  Alex Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[16]  Maria Teresa Lazaro,et al.  Multi-robot SLAM using condensed measurements , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Yixuan He,et al.  Scan-Flood Fill(SCAFF): An Efficient Automatic Precise Region Filling Algorithm for Complicated Regions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Kanji Tanaka Cross-season place recognition using NBNN scene descriptor , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  P.S. Hiremath,et al.  Content Based Image Retrieval Using Color, Texture and Shape Features , 2007, 15th International Conference on Advanced Computing and Communications (ADCOM 2007).

[20]  Cyrill Stachniss,et al.  Global Localization on OpenStreetMap Using 4-bit Semantic Descriptors , 2019, 2019 European Conference on Mobile Robots (ECMR).

[21]  Devavrat Shah,et al.  Q-learning with Nearest Neighbors , 2018, NeurIPS.

[22]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[23]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[24]  Ryan M. Eustice,et al.  University of Michigan North Campus long-term vision and lidar dataset , 2016, Int. J. Robotics Res..

[25]  Stefan Lee,et al.  Graph R-CNN for Scene Graph Generation , 2018, ECCV.

[26]  Milad Ramezani,et al.  Online LiDAR-SLAM for Legged Robots with Robust Registration and Deep-Learned Loop Closure , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).