Scene Retrieval for Contextual Visual Mapping

Visual navigation localizes a query place image against a reference database of place images, also known as a ‘visual map’. Localization accuracy requirements for specific areas of the visual map, ‘scene classes’, vary according to the context of the environment and task. State-of-the-art visual mapping is unable to reflect these requirements by explicitly targetting scene classes for inclusion in the map. Four different scene classes, including pedestrian crossings and stations, are identified in each of the Nordland and St. Lucia datasets. Instead of re-training separate scene classifiers which struggle with these overlapping scene classes we make our first contribution: defining the problem of ‘scene retrieval’. Scene retrieval extends image retrieval to classification of scenes defined at test time by associating a single query image to reference images of scene classes. Our second contribution is a triplet-trained convolutional neural network (CNN) to address this problem which increases scene classification accuracy by up to 7% against state-of-the-art networks pre-trained for scene recognition. The second contribution is an algorithm ‘DMC’ that combines our scene classification with distance and memorability for visual mapping. Our analysis shows that DMC includes 64% more images of our chosen scene classes in a visual map than just using distance interval mapping. Stateof-the-art visual place descriptors AMOS-Net, Hybrid-Net and NetVLAD are finally used to show that DMC improves scene class localization accuracy by a mean of 3% and localization accuracy of the remaining map images by a mean of 10% across both datasets.

[1]  Hongje Seong,et al.  FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition , 2019, IEEE Access.

[2]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[3]  Songfan Yang,et al.  Multi-scale Recognition with DAG-CNNs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Henrik I. Christensen,et al.  DEDUCE: Diverse scEne Detection methods in Unseen Challenging Environments , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Luis Miguel Bergasa,et al.  Need data for driver behaviour analysis? Presenting the public UAH-DriveSet , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[6]  Lingqiao Liu,et al.  Learning Context Flexible Attention Model for Long-Term Visual Place Recognition , 2018, IEEE Robotics and Automation Letters.

[7]  Patrick Rives,et al.  Appearance-based segmentation of indoors/outdoors sequences of spherical views , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Niko Sünderhauf,et al.  Are We There Yet? Challenging SeqSLAM on a 3000 km Journey Across All Four Seasons , 2013 .

[9]  Michael Milford,et al.  Straightening sequence-search for appearance-invariant place recognition using robust motion estimation , 2017, ICRA 2017.

[10]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[11]  Inkyu Sa,et al.  Only look once, mining distinctive landmarks from ConvNet for visual place recognition , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Javier González,et al.  Appearance-invariant place recognition by discriminatively training a convolutional neural network , 2017, Pattern Recognit. Lett..

[13]  Michael Milford,et al.  Deep learning features at scale for visual place recognition , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Klaus D. McDonald-Maier,et al.  Memorable Maps: A Framework for Re-Defining Places in Visual Place Recognition , 2018, IEEE Transactions on Intelligent Transportation Systems.

[15]  Kostas Alexis,et al.  Are State-of-the-art Visual Place Recognition Techniques any Good for Aerial Robotics? , 2019, ArXiv.

[16]  Michael Milford,et al.  CoHOG: A Light-Weight, Compute-Efficient, and Training-Free Visual Place Recognition Technique for Changing Environments , 2020, IEEE Robotics and Automation Letters.

[17]  Class Anchor Clustering: a Distance-based Loss for Training Open Set Classifiers , 2020, ArXiv.

[18]  Antonio Torralba,et al.  Understanding and Predicting Image Memorability at a Large Scale , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Roland Siegwart,et al.  Will It Last? Learning Stable Features for Long-Term Visual Localization , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[20]  Jianru Xue,et al.  A Survey of Scene Understanding by Event Reasoning in Autonomous Driving , 2018, Int. J. Autom. Comput..

[21]  Shoaib Ehsan,et al.  A Holistic Visual Place Recognition Approach Using Lightweight CNNs for Significant ViewPoint and Appearance Changes , 2020, IEEE Transactions on Robotics.

[22]  Klaus D. McDonald-Maier,et al.  CAMAL: Context-Aware Multi-scale Attention framework for Lightweight Visual Place Recognition , 2019, ArXiv.

[23]  Niko Sünderhauf,et al.  Look No Deeper: Recognizing Places from Opposing Viewpoints under Varying Scene Appearance using Single-View Depth Estimation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[24]  Gordon Wyeth,et al.  FAB-MAP + RatSLAM: Appearance-based SLAM for multiple times of day , 2010, 2010 IEEE International Conference on Robotics and Automation.

[25]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Konrad Schindler,et al.  Predicting Matchability , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Ziqi Wang,et al.  Attention-Aware Age-Agnostic Visual Place Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[28]  Qiu Chen,et al.  Scene recognition: A comprehensive survey , 2020, Pattern Recognit..

[29]  Klaus D. McDonald-Maier,et al.  Levelling the Playing Field: A Comprehensive Comparison of Visual Place Recognition Approaches under Changing Conditions , 2019, ArXiv.

[30]  Michael Milford,et al.  Filter Early, Match Late: Improving Network-Based Visual Place Recognition , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[32]  Alexander Carballo,et al.  A Survey of Autonomous Driving: Common Practices and Emerging Technologies , 2019, IEEE Access.

[33]  Michael Warren,et al.  Unaided stereo vision based pose estimation , 2010, ICRA 2010.

[34]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).