How to Train a CAT: Learning Canonical Appearance Transformations for Direct Visual Localization Under Illumination Change

Direct visual localization has recently enjoyed a resurgence in popularity with the increasing availability of cheap mobile computing power. The competitive accuracy and robustness of these algorithms compared to state-of-the-art feature-based methods, as well as their natural ability to yield dense maps, makes them an appealing choice for a variety of mobile robotics applications. However, direct methods remain brittle in the face of appearance change due to their underlying assumption of photometric consistency, which is commonly violated in practice. In this letter, we propose to mitigate this problem by training deep convolutional encoder–decoder models to transform images of a scene such that they correspond to a previously seen canonical appearance. We validate our method in multiple environments and illumination conditions using high-fidelity synthetic RGB-D datasets, and integrate the trained models into a direct visual localization pipeline, yielding improvements in visual odometry accuracy through time-varying illumination conditions, as well as improved metric relocalization performance under illumination change, where conventional methods normally fail. We further provide a preliminary investigation of transfer learning from synthetic to real environments in a localization context.

[1]  Gabe Sibley,et al.  Light Source Estimation with Analytical Path-tracing , 2017, ArXiv.

[2]  Javier González,et al.  Learning-Based Image Enhancement for Visual Odometry in Challenging HDR Environments , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[5]  Sanja Fidler,et al.  Find your way by observing the sun and other semantic cues , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6]  François Pomerleau,et al.  Expanding the Limits of Vision-based Localization for Long-term Route-following Autonomy , 2017, J. Field Robotics.

[7]  Jörg Stückler,et al.  Large-scale direct SLAM with stereo cameras , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Marc Pollefeys,et al.  Illumination change robustness in direct visual SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Roland Siegwart,et al.  Dense visual-inertial navigation system for mobile robots , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Paolo Valigi,et al.  Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation , 2016, IEEE Robotics and Automation Letters.

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[16]  Sergey Levine,et al.  Backprop KF: Learning Discriminative Deterministic State Estimators , 2016, NIPS.

[17]  Angela P. Schoellig,et al.  Deep neural networks for improved, impromptu trajectory tracking of quadrotors , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Paul Newman,et al.  Shady dealings: Robust, long-term visual localisation using illumination invariance , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Andrew W. Fitzgibbon,et al.  Image-Based Rendering Using Image-Based Priors , 2005, International Journal of Computer Vision.

[21]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[22]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[23]  Timothy D. Barfoot,et al.  Robust Monocular Visual Teach and Repeat Aided by Local Ground Planarity and Color‐constant Imagery , 2017, J. Field Robotics.

[24]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[25]  Peter I. Corke,et al.  High-fidelity simulation for evaluating robotic vision performance , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[28]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Esa Rahtu,et al.  Relative Camera Pose Estimation Using Convolutional Neural Networks , 2017, ACIVS.

[30]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Atsuto Maki,et al.  Towards a simulation driven stereo vision system , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[32]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Jonathan Kelly,et al.  Reducing drift in visual odometry by inferring sun direction using a Bayesian Convolutional Neural Network , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Stefan Leutenegger,et al.  ElasticFusion: Real-time dense SLAM and light source estimation , 2016, Int. J. Robotics Res..

[35]  Gabe Sibley,et al.  Light Source Estimation in Synthetic Images , 2016, ECCV Workshops.

[36]  Viorica Patraucean,et al.  gvnn: Neural Network Library for Geometric Computer Vision , 2016, ECCV Workshops.

[37]  Timothy D. Barfoot,et al.  State Estimation for Robotics , 2017 .

[38]  Davide Scaramuzza,et al.  Active exposure control for robust visual odometry in HDR environments , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..