TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts

Visual place recognition is a fundamental problem for many vision based applications. Sparse feature and deep learning based methods have been successful and dominant over the decade. However, most of them do not explicitly leverage high-level semantic information to deal with challenging scenarios where they may fail. This paper proposes a novel visual place recognition algorithm, termed TextPlace, based on scene texts in the wild. Since scene texts are high-level information invariant to illumination changes and very distinct for different places when considering spatial correlation, it is beneficial for visual place recognition tasks under extreme appearance changes and perceptual aliasing. It also takes spatial-temporal dependence between scene texts into account for topological localization. Extensive experiments show that TextPlace achieves state-of-the-art performance, verifying the effectiveness of using high-level scene texts for robust visual place recognition in urban areas.

[1]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[6]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[7]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Cyrill Stachniss,et al.  Relocalization under Substantial Appearance Changes using Hashing , 2017 .

[10]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[11]  Wolfram Burgard,et al.  Robust Visual Robot Localization Across Seasons Using Network Flows , 2014, AAAI.

[12]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  William P. Maddern,et al.  Adversarial Training for Adverse Conditions: Robust Metric Localisation Using Appearance Transfer , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[15]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[16]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[18]  Michael Milford,et al.  LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics , 2018, Robotics: Science and Systems.

[19]  Wolfram Burgard,et al.  Do you see the bakery? Leveraging geo-referenced texts for global localization in public maps , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[21]  Wolfram Burgard,et al.  Robust Visual Localization Across Seasons , 2018, IEEE Transactions on Robotics.

[22]  Tao Wu,et al.  Light-weight localization for vehicles using road markings , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Margarita Chli,et al.  Viewpoint-Tolerant Place Recognition Combining 2D and 3D Information for UAV Navigation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  John J. Leonard,et al.  Bridging text spotting and SLAM with junction features , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[27]  Masatoshi Okutomi,et al.  24/7 Place Recognition by View Synthesis , 2015, CVPR.

[28]  Luc Van Gool,et al.  Night-to-Day Image Translation for Retrieval-based Localization , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[29]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[30]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.