RGB-D-E: Event Camera Calibration for Fast 6-DOF object Tracking

Augmented reality devices require multiple sensors to perform various tasks such as localization and tracking. Currently, popular cameras are mostly frame-based (e.g. RGB and Depth) which impose a high data bandwidth and power usage. With the necessity for low power and more responsive augmented reality systems, using solely frame-based sensors imposes limits to the various algorithms that needs high frequency data from the environement. As such, event-based sensors have become increasingly popular due to their low power, bandwidth and latency, as well as their very high frequency data acquisition capabilities. In this paper, we propose, for the first time, to use an event-based camera to increase the speed of 3D object tracking in 6 degrees of freedom. This application requires handling very high object speed to convey compelling AR experiences. To this end, we propose a new system which combines a recent RGB-D sensor (Kinect Azure) with an event camera (DAVIS346). We develop a deep learning approach, which combines an existing RGB-D network along with a novel event-based network in a cascade fashion, and demonstrate that our approach significantly improves the robustness of a state-of-the-art frame-based 6-DOF object tracker using our RGB-D-E pipeline. Our code and our RGB-D-E evaluation dataset are available at https://github.com/lvsn/rgbde-tracking.

[1]  Stefan Leutenegger,et al.  Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera , 2016, ECCV.

[2]  Henry Markram,et al.  On the computational power of circuits of spiking neurons , 2004, J. Comput. Syst. Sci..

[3]  Homer H. Chen,et al.  Analysis and Compensation of Rolling Shutter Effect , 2008, IEEE Transactions on Image Processing.

[4]  Davide Scaramuzza,et al.  End-to-End Learning of Representations for Asynchronous Event-Based Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[6]  Nick Barnes,et al.  Fast Image Reconstruction with an Event Camera , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Davide Scaramuzza,et al.  ESIM: an Open Event Camera Simulator , 2018, CoRL.

[8]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[9]  Daniel Cremers,et al.  Event-based 3D SLAM with a depth-augmented dynamic vision sensor , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Kostas Daniilidis,et al.  EventGAN: Leveraging Large Scale Image Datasets for Event Cameras , 2019, 2021 IEEE International Conference on Computational Photography (ICCP).

[12]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[13]  Davide Scaramuzza,et al.  Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High-Speed Scenarios , 2017, IEEE Robotics and Automation Letters.

[14]  Narciso García,et al.  Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Jean-François Lalonde,et al.  Deep 6-DOF Tracking , 2017, IEEE Transactions on Visualization and Computer Graphics.

[16]  Yiannis Aloimonos,et al.  Event-Based Moving Object Detection and Tracking , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[18]  Davide Scaramuzza,et al.  Event-based, Direct Camera Tracking from a Photometric 3D Map using Nonlinear Optimization , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Vladlen Koltun,et al.  Events-To-Video: Bringing Modern Computer Vision to Event Cameras , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jianping Li,et al.  Calibrate Multiple Consumer RGB-D Cameras for Low-Cost and Efficient 3D Indoor Mapping , 2018, Remote. Sens..

[21]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[22]  Kostas Daniilidis,et al.  Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Nassir Navab,et al.  A Versatile Learning-Based 3D Temporal Tracker: Scalable, Robust, Online , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Davide Scaramuzza,et al.  EVO: A Geometric Approach to Event-Based 6-DOF Parallel Tracking and Mapping in Real Time , 2017, IEEE Robotics and Automation Letters.

[25]  Davide Scaramuzza,et al.  Event-Based, 6-DOF Camera Tracking from Photometric Depth Maps , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Vladlen Koltun,et al.  High Speed and High Dynamic Range Video with an Event Camera , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[29]  Nassir Navab,et al.  Deep Model-Based 6D Pose Refinement in RGB , 2018, ECCV.

[30]  Roland Siegwart,et al.  Comparing ICP variants on real-world data sets , 2013, Auton. Robots.

[31]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[32]  Davide Scaramuzza,et al.  Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High-Speed Scenarios , 2017, IEEE Robotics and Automation Letters.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Denis Laurendeau,et al.  A Framework for Evaluating 6-DOF Object Trackers , 2018, ECCV.

[35]  T. Delbruck,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < 1 , 2022 .

[36]  Petros Maragos,et al.  How to track your dragon: A Multi-Attentional Framework for real-time RGB-D 6-DOF Object Pose Tracking , 2020, ECCV Workshops.

[37]  Titus Cieslewski,et al.  Are We Ready for Autonomous Drone Racing? The UZH-FPV Drone Racing Dataset , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[38]  Davide Scaramuzza,et al.  Real-time Visual-Inertial Odometry for Event Cameras using Keyframe-based Nonlinear Optimization , 2017, BMVC.

[39]  Tobi Delbrück,et al.  Training Deep Spiking Neural Networks Using Backpropagation , 2016, Front. Neurosci..

[40]  Tobi Delbrück,et al.  The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM , 2016, Int. J. Robotics Res..

[41]  Manolis I. A. Lourakis,et al.  T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[42]  Nikos G. Tsagarakis,et al.  Real-Time 6DOF Pose Relocalization for Event Cameras With Stacked Spatial LSTM Networks , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[43]  Christian Theobalt,et al.  Real-Time Hand Tracking Under Occlusion from an Egocentric RGB-D Sensor , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[44]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Hongjie Liu,et al.  DVS Benchmark Datasets for Object Tracking, Action Recognition, and Object Recognition , 2016, Front. Neurosci..