Learning-Based Image Enhancement for Visual Odometry in Challenging HDR Environments

One of the main open challenges in visual odometry (VO) is the robustness to difficult illumination conditions or high dynamic range (HDR) environments. The main difficulties in these situations come from both the limitations of the sensors and the inability to perform a successful tracking of interest points because of the bold assumptions in VO, such as brightness constancy. We address this problem from a deep learning perspective, for which we first fine-tune a deep neural network with the purpose of obtaining enhanced representations of the sequences for VO. Then, we demonstrate how the insertion of long short term memory allows us to obtain temporally consistent sequences, as the estimation depends on previous states. However, the use of very deep networks enlarges the computational burden of the VO framework; therefore, we also propose a convolutional neural network of reduced size capable of performing faster. Finally, we validate the enhanced representations by evaluating the sequences produced by the two architectures in several state-of-art VO algorithms, such as ORB-SLAM and DSO.

[1]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[2]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  S. Pizer,et al.  The Image Processing Handbook , 1994 .

[5]  Karel J. Zuiderveld,et al.  Contrast Limited Adaptive Histogram Equalization , 1994, Graphics Gems.

[6]  Nima Tajbakhsh,et al.  Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? , 2016, IEEE Transactions on Medical Imaging.

[7]  Davide Scaramuzza,et al.  Active exposure control for robust visual odometry in HDR environments , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[9]  Tom Drummond,et al.  Edge landmarks in monocular SLAM , 2009, Image Vis. Comput..

[10]  Paolo Valigi,et al.  Fast robust monocular depth estimation for Obstacle Detection with fully convolutional networks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[12]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[13]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  In-So Kweon,et al.  Auto-adjusting camera exposure for outdoor robotics using gradient information , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Jitendra Malik,et al.  Recovering high dynamic range radiance maps from photographs , 1997, SIGGRAPH '08.

[16]  Stefano Soatto,et al.  Real-time feature tracking and outlier rejection with changes in illumination , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18]  Javier González,et al.  PL-SVO: Semi-direct Monocular Visual Odometry by combining points and line segments , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Javier González,et al.  Training a Convolutional Neural Network for Appearance-Invariant Place Recognition , 2015, ArXiv.

[20]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Atsuto Maki,et al.  Towards a simulation driven stereo vision system , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[24]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[25]  Armando J. Pinho,et al.  Autonomous Configuration of Parameters in Robotic Digital Cameras , 2009, IbPRIA.

[26]  Paolo Valigi,et al.  Toward Domain Independence for Learning-Based Monocular Depth Estimation , 2017, IEEE Robotics and Automation Letters.

[27]  Huimin Lu,et al.  Camera parameters auto-adjusting technique for robust robot vision , 2010, 2010 IEEE International Conference on Robotics and Automation.

[28]  Stefano Soatto,et al.  Real-Time Feature Tracking and Outlier Rejection with Changes in Illumination , 2001, ICCV.

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Vincent Lepetit,et al.  BRIEF: Computing a Local Binary Descriptor Very Fast , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[32]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[33]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[34]  Yang Zhang,et al.  HDRFusion: HDR SLAM Using a Low-Cost Auto-Exposure RGB-D Sensor , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[35]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[36]  Brett Browning,et al.  Direct Visual Odometry in Low Light Using Binary Descriptors , 2017, IEEE Robotics and Automation Letters.