Fast, Compact and Highly Scalable Visual Place Recognition through Sequence-based Matching of Overloaded Representations

Visual place recognition algorithms trade off three key characteristics: their storage footprint, their computational requirements, and their resultant performance, often expressed in terms of recall rate. Significant prior work has investigated highly compact place representations, sub-linear computational scaling and sub-linear storage scaling techniques, but have always involved a significant compromise in one or more of these regards, and have only been demonstrated on relatively small datasets. In this paper we present a novel place recognition system which enables for the first time the combination of ultra-compact place representations, near sub-linear storage scaling and extremely lightweight compute requirements. Our approach exploits the inherently sequential nature of much spatial data in the robotics domain and inverts the typical target criteria, through intentionally coarse scalar quantization-based hashing that leads to more collisions but is resolved by sequence-based matching. For the first time, we show how effective place recognition rates can be achieved on a new very large 10 million place dataset, requiring only 8 bytes of storage per place and 37K unitary operations to achieve over 50% recall for matching a sequence of 100 frames, where a conventional stateof-the-art approach both consumes 1300 times more compute and fails catastrophically. We present analysis investigating the effectiveness of our hashing overload approach under varying sizes of quantized vector length, comparison of near miss matches with the actual match selections and characterise the effect of variance re-scaling of data on quantization. Resource link: https://github.com/oravus/CoarseHash

[1]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[2]  Tomohiro Shibata,et al.  High performance loop closure detection using bag of word pairs , 2016, Robotics Auton. Syst..

[3]  Alberto Ortiz,et al.  Hierarchical Place Recognition for Topological Mapping , 2017, IEEE Transactions on Robotics.

[4]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[5]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[7]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[8]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[9]  Michael Milford,et al.  BTEL: A Binary Tree Encoding Approach for Visual Localization , 2019, IEEE Robotics and Automation Letters.

[10]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[11]  Masatoshi Okutomi,et al.  24/7 Place Recognition by View Synthesis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Paul Newman,et al.  FAB-MAP 3D: Topological mapping with spatial and visual appearance , 2010, 2010 IEEE International Conference on Robotics and Automation.

[13]  Peter I. Corke,et al.  Routed roads: Probabilistic vision-based place recognition for changing conditions, split streets and varied viewpoints , 2016, Int. J. Robotics Res..

[14]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[15]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Tom Drummond,et al.  FANNG: Fast Approximate Nearest Neighbour Graphs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Lu Fang,et al.  MILD: Multi-index hashing for appearance based loop closure detection , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[18]  Jing Wang,et al.  Scalable k-NN graph construction for visual descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Michael Milford,et al.  Convolutional Neural Network-based Place Recognition , 2014, ICRA 2014.

[20]  Victor S. Lempitsky,et al.  Efficient Indexing of Billion-Scale Datasets of Deep Descriptors , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Cyrill Stachniss,et al.  Lazy Data Association For Image Sequences Matching Under Substantial Appearance Changes , 2016, IEEE Robotics and Automation Letters.

[22]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[23]  Michael Milford,et al.  Rhythmic Representations: Learning Periodic Patterns for Scalable Place Recognition at a Sublinear Storage Cost , 2018, IEEE Robotics and Automation Letters.

[24]  Cyrill Stachniss,et al.  Relocalization under Substantial Appearance Changes using Hashing , 2017 .

[25]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[26]  Tat-Jun Chin,et al.  Scalable Place Recognition Under Appearance Change for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Jonathan Brandt,et al.  Transform coding for fast approximate nearest neighbor search in high dimensions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Giorgio Grisetti,et al.  HBST: A Hamming Distance Embedding Binary Search Tree for Feature-Based Visual Place Recognition , 2018, IEEE Robotics and Automation Letters.

[29]  Laurent Amsaleg,et al.  NV-Tree: nearest neighbors at the billion scale , 2011, ICMR '11.

[30]  Michael Milford,et al.  Semantic–geometric visual place recognition: a new perspective for reconciling opposing views , 2019, Int. J. Robotics Res..

[31]  Shilin Zhou,et al.  Tree-based indexing for real-time ConvNet landmark-based visual place recognition , 2017 .

[32]  Michael Milford,et al.  MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale Robotic Navigation , 2020, ArXiv.

[33]  Matthijs Douze,et al.  Link and Code: Fast Indexing with Graphs and Compact Regression Codes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[35]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[36]  Luis Miguel Bergasa,et al.  Fusion and binarization of CNN features for robust topological localization across seasons , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Michael Milford,et al.  LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics , 2018, Robotics: Science and Systems.

[38]  Victor S. Lempitsky,et al.  The Inverted Multi-Index , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[41]  Yannis Avrithis,et al.  Locally Optimized Product Quantization for Approximate Nearest Neighbor Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Titus Cieslewski,et al.  Efficient Decentralized Visual Place Recognition Using a Distributed Inverted Index , 2017, IEEE Robotics and Automation Letters.

[43]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[44]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[45]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[46]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[47]  Michael Lindenbaum,et al.  Sequential Karhunen-Loeve basis extraction and its application to images , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[48]  Henrik Andreasson,et al.  Lightweight, Viewpoint-Invariant Visual Place Recognition in Changing Environments , 2018, IEEE Robotics and Automation Letters.

[49]  Jian Sun,et al.  Optimized Product Quantization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Ajay Narendra,et al.  A Hybrid Compact Neural Architecture for Visual Place Recognition , 2020, IEEE Robotics and Automation Letters.

[51]  Abel Gawel,et al.  X-View: Graph-Based Semantic Multiview Localization , 2017, IEEE Robotics and Automation Letters.

[52]  Hervé Jégou,et al.  Searching with expectations , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[53]  Wolfram Burgard,et al.  Robust Visual Localization Across Seasons , 2018, IEEE Transactions on Robotics.

[54]  Chih-Yi Chiu,et al.  Learning to Index for Nearest Neighbor Search , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  R. Siegwart,et al.  A Partitioned Approach for Efficient Graph-Based Place Recognition , 2017 .

[56]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..