Learning to Fuse Multiscale Features for Visual Place Recognition

Efficient and robust visual place recognition is of great importance to autonomous mobile robots. Recent work has shown that features learned from convolutional neural networks achieve impressed performance with efficient feature size, where most of them are pooled or aggregated from a convolutional feature map. However, convolutional filters only capture the appearance of their perceptive fields, which lack the considerations on how to combine the multiscale appearance for place recognition. In this paper, we propose a novel method to build a multiscale feature pyramid and present two approaches to use the pyramid to augment the place recognition capability. The first approach fuses the pyramid to obtain a new feature map, which has an awareness of both the local and semi-global appearance, and the second approach learns an attention model from the feature pyramid to weight the spatial grids on the original feature map. Both approaches combine the multiscale features in the pyramid to suppress the confusing local features while tackling the problem in two different ways. Extensive experiments have been conducted on benchmark datasets with varying degrees of appearance and viewpoint variations. The results show that the proposed approaches achieve superior performance over the networks without the multiscale feature fusion and the multiscale attention components. Analyses on the performance of using different feature pyramids are also provided.

[1]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[2]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[6]  Jan-Michael Frahm,et al.  Learned Contextual Feature Reweighting for Image Geo-Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[8]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[9]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[10]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Michael Milford,et al.  Deep learning features at scale for visual place recognition , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Michael Milford,et al.  Don't Look Back: Robustifying Place Categorization for Viewpoint- and Condition-Invariant Place Recognition , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[14]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Tom Duckett,et al.  An adaptive appearance-based map for long-term topological localization of mobile robots , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Huimin Lu,et al.  Recognition of surrounding environment from electric wheelchair videos based on modified YOLOv2 , 2019, Future Gener. Comput. Syst..

[18]  Wolfram Burgard,et al.  Semantics-aware visual localization under challenging perceptual conditions , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Davide Scaramuzza,et al.  Air‐ground Matching: Appearance‐based GPS‐denied Urban Localization of Micro Aerial Vehicles , 2015, J. Field Robotics.

[20]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[22]  James J. Little,et al.  Mobile Robot Localization and Mapping with Uncertainty using Scale-Invariant Visual Landmarks , 2002, Int. J. Robotics Res..

[23]  Lingqiao Liu,et al.  Learning Context Flexible Attention Model for Long-Term Visual Place Recognition , 2018, IEEE Robotics and Automation Letters.

[24]  Huimin Lu,et al.  Environment Recognition for Electric Wheelchair Based on YOLOv2 , 2018, ICBIP '18.

[25]  Fabio Tozeto Ramos,et al.  Robust place recognition with stereo cameras , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Jiong Wang,et al.  Attention-based Pyramid Aggregation Network for Visual Place Recognition , 2018, ACM Multimedia.

[27]  Niko Sünderhauf,et al.  Are We There Yet? Challenging SeqSLAM on a 3000 km Journey Across All Four Seasons , 2013 .

[28]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[29]  Huimin LU,et al.  Wide Residual Networks for Semantic Segmentation , 2018, 2018 18th International Conference on Control, Automation and Systems (ICCAS).

[30]  Michael Milford,et al.  Convolutional Neural Network-based Place Recognition , 2014, ICRA 2014.

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Inkyu Sa,et al.  Only look once, mining distinctive landmarks from ConvNet for visual place recognition , 2017, IROS 2017.

[33]  Michael Milford,et al.  An adaptive localization system for image storage and localization latency requirements , 2018, Robotics Auton. Syst..