RISE: A Novel Indoor Visual Place Recogniser

This paper presents a new technique to solve the Indoor Visual Place Recognition problem from the Deep Learning perspective. It consists on an image retrieval approach supported by a novel image similarity metric. Our work uses a 3D laser sensor mounted on a backpack with a calibrated spherical camera i) to generate the data for training the deep neural network and ii) to build a database of geo-referenced images for an environment. The data collection stage is fully automatic and requires no user intervention for labelling. Thanks to the 3D laser measurements and the spherical panoramas, we can efficiently survey large indoor areas in a very short time. The underlying 3D data associated to the map allows us to define the similarity between two training images as the geometric overlap between the observed pixels. We exploit this similarity metric to effectively train a CNN that maps images into compact embeddings. The goal of the training is to ensure that the L2 distance between the embeddings associated to two images is small when they are observing the same place and large when they are observing different places. After the training, similarities between a query image and the geo-referenced images in the database are efficiently retrieved by performing a nearest neighbour search in the embeddings space.

[1]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Hui Lin,et al.  Indoor Space Recognition using Deep Convolutional Neural Network: A Case Study at MIT Campus , 2016, ArXiv.

[4]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Victor S. Lempitsky,et al.  Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[6]  Masatoshi Okutomi,et al.  24/7 Place Recognition by View Synthesis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[8]  Luming Zhang,et al.  Fusion of Magnetic and Visual Sensors for Indoor Localization: Infrastructure-Free and More Effective , 2017, IEEE Transactions on Multimedia.

[9]  Martin Cadík,et al.  GeoPose3K: Mountain landscape dataset for camera pose estimation in outdoor environments , 2017, Image Vis. Comput..

[10]  Ben Glocker,et al.  Real-time RGB-D camera relocalization , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[11]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ruizhi Chen,et al.  Indoor Visual Positioning Aided by CNN-Based Image Retrieval: Training-Free, 3D Modeling-Free , 2018, Sensors.

[13]  Jan-Michael Frahm,et al.  Learned Contextual Feature Reweighting for Image Geo-Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[15]  Yan Su,et al.  Loop closure detection for visual SLAM systems using convolutional neural network , 2017, 2017 23rd International Conference on Automation and Computing (ICAC).

[16]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[17]  Michael Milford,et al.  Convolutional Neural Network-based Place Recognition , 2014, ICRA 2014.

[18]  Connor Greenwell,et al.  Large-scale geo-facial image analysis , 2015, EURASIP J. Image Video Process..

[19]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Erik Wolfart,et al.  Localization and tracking in known large environments using portable real-time 3D sensors , 2016, Comput. Vis. Image Underst..

[21]  Francisco Angel Moreno,et al.  A collection of outdoor robotic datasets with centimeter-accuracy ground truth , 2009, Auton. Robots.

[22]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25]  Simone Ceriani,et al.  Detecting Ambiguity in Localization Problems Using Depth Sensors , 2014, 2014 2nd International Conference on 3D Vision.

[26]  Il Hong Suh,et al.  Visual navigation using place recognition with visual line words , 2014, 2014 11th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI).

[27]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[28]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[29]  Masatoshi Okutomi,et al.  Visual Place Recognition with Repetitive Structures , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[31]  Niko Sünderhauf,et al.  Are We There Yet? Challenging SeqSLAM on a 3000 km Journey Across All Four Seasons , 2013 .

[32]  Shuda Li,et al.  RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets , 2018, ECCV.

[33]  Erik Wolfart,et al.  Pose interpolation SLAM for large maps using moving 3D sensors , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).