Real-Time 6DOF Pose Relocalization for Event Cameras With Stacked Spatial LSTM Networks

We present a new method to relocalize the 6DOF pose of an event camera solely based on the event stream. Our method first creates the event image from a list of events that occurs in a very short time interval, then a Stacked Spatial LSTM Network (SP-LSTM) is used to learn the camera pose. Our SP-LSTM is composed of a CNN to learn deep features from the event images and a stack of LSTM to learn spatial dependencies in the image feature space. We show that the spatial dependency plays an important role in the relocalization task with event images and the SP-LSTM can effectively learn this information. The extensively experimental results on a publicly available dataset show that our approach outperforms recent state-of-the-art methods by a substantial margin, as well as generalizes well in challenging training/testing splits. The source code and trained models are available at https://github.com/nqanh/pose_relocalization.

[1]  Thomas Pock,et al.  Real-time panoramic tracking for event cameras , 2017, 2017 IEEE International Conference on Computational Photography (ICCP).

[2]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Davide Scaramuzza,et al.  Continuous-Time Visual-Inertial Trajectory Estimation with Event Cameras , 2017, ArXiv.

[4]  Ryad Benosman,et al.  Simultaneous Mosaicing and Tracking with an Event Camera , 2014, BMVC.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Matthias Bethge,et al.  Generative Image Modeling Using Spatial LSTMs , 2015, NIPS.

[7]  Sen Wang,et al.  VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Davide Scaramuzza,et al.  Accurate Angular Velocity Estimation With an Event Camera , 2017, IEEE Robotics and Automation Letters.

[9]  Juho Kannala,et al.  Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[10]  Nikolaos G. Tsagarakis,et al.  V2CNet: A Deep Learning Framework to Translate Videos to Commands for Robotic Manipulation , 2019, ArXiv.

[11]  Davide Scaramuzza,et al.  Event-based, 6-DOF pose tracking for high-speed maneuvers , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Stefan Leutenegger,et al.  Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera , 2016, ECCV.

[14]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Tobi Delbrück,et al.  The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM , 2016, Int. J. Robotics Res..

[16]  Xiaolin Hu,et al.  Delving deeper into convolutional neural networks for camera relocalization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Lei Wang,et al.  Convolutional Recurrent Neural Networks for Text Classification , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[18]  Tobi Delbruck,et al.  A 240 × 180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision Sensor , 2014, IEEE Journal of Solid-State Circuits.

[19]  Ruigang Yang,et al.  DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ian D. Reid,et al.  LieNet: Real-time Monocular Object Instance 6D Pose Estimation , 2018, BMVC.

[22]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Daniel Cremers,et al.  Event-based 3D SLAM with a depth-augmented dynamic vision sensor , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Davide Scaramuzza,et al.  Real-time Visual-Inertial Odometry for Event Cameras using Keyframe-based Nonlinear Optimization , 2017, BMVC.

[25]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Kostas Daniilidis,et al.  Event-Based Visual Inertial Odometry , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Daniel Cremers,et al.  Image-based Localization with Spatial LSTMs , 2016, ArXiv.

[28]  Wolfram Burgard,et al.  Deep Auxiliary Learning for Visual Localization and Odometry , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Davide Scaramuzza,et al.  Low-latency event-based visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Wolfram Burgard,et al.  Deep regression for monocular camera-based 6-DoF global localization in outdoor environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Nikolaos G. Tsagarakis,et al.  Object Captioning and Retrieval with Natural Language , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[32]  Tobi Delbrück,et al.  A pencil balancing robot using a pair of AER dynamic vision sensors , 2009, 2009 IEEE International Symposium on Circuits and Systems.