Mining DCNN landmarks for long-term visual SLAM

Long-term visual SLAM, in familiar, semi-dynamic, and partially changing environments is an important area of research in robotics. The main problem we faced is the question of how to describe a scene discriminatively and compactly-both of which are necessary in order to cope with changes in appearance and a large amount of visual information. In this study, we address the above issues by mining visual experience. Our strategy is to mine a library of raw visual images, termed visual experience, to find the relevant visual patterns to effectively explain the input scene. From a practical point of view, our work offers three main contributions over the previous work. First, it is the first application of discriminative visual features from deep convolutional neural networks (DCNN) to the task of visual landmark mining. Second, we show how to interpret a high-dimensional DCNN feature to a compact semantic representation of visual word. Third, we show that our approach can turn the scene description task with any feature (including the DCNN feature) into the task of mining visual experience. Experiments on a challenging cross-domain visual place recognition validate efficacy of the proposed approach.

[1]  Masatoshi Ando,et al.  Mining visual phrases for long-term visual SLAM , 2014, IROS.

[2]  Tanaka Kanji Unsupervised part-based scene modeling for visual robot localization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[4]  Yoshiaki Shirai,et al.  View-based localization in outdoor environments based on support vector learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Kanji Tanaka Cross-season place recognition using NBNN scene descriptor , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Tom Duckett,et al.  FreMEn: Frequency Map Enhancement for Long-Term Mobile Robot Autonomy in Changing Environments , 2017, IEEE Transactions on Robotics.

[8]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[10]  Cyrill Stachniss,et al.  Lazy Sequences Matching Under Substantial Appearance Changes ( Short Paper ) , 2015 .

[11]  Yanagihara Kentaro,et al.  Leveraging image-based prior in cross-season place recognition , 2015, ICRA 2015.

[12]  Wolfram Burgard,et al.  Robust visual SLAM across seasons , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jens Wawerla,et al.  The SFU Mountain Dataset : Semi-Structured Woodland Trails Under Changing Environmental Conditions , 2015 .

[15]  Trevor Darrell,et al.  The NBNN kernel , 2011, 2011 International Conference on Computer Vision.

[16]  Yasir Latif,et al.  Place categorization using sparse and redundant representations , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Tanaka Kanji Self-localization from Images with Small Overlap , 2016, IROS 2016.

[18]  Paul Newman,et al.  Work smart, not hard: Recalling relevant experiences for vast-scale but time-constrained localisation , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Barbara Caputo,et al.  Frustratingly Easy NBNN Domain Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Ce Liu,et al.  Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Ryan M. Eustice,et al.  University of Michigan North Campus long-term vision and lidar dataset , 2016, Int. J. Robotics Res..

[24]  Zhuowen Tu,et al.  Robust Point Matching via Vector Field Consensus , 2014, IEEE Transactions on Image Processing.

[25]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Mario Fritz,et al.  The Pooled NBNN Kernel: Beyond Image-to-Class and Image-to-Image , 2012, ACCV.