Differentiable Event Stream Simulator for Non-Rigid 3D Tracking

This paper introduces the first differentiable simulator of event streams, i.e., streams of asynchronous brightness change signals recorded by event cameras. Our differentiable simulator enables non-rigid 3D tracking of deformable objects (such as human hands, isometric surfaces and general watertight meshes) from event streams by leveraging an analysis-by-synthesis principle. So far, event-based tracking and reconstruction of non-rigid objects in 3D, like hands and body, has been either tackled using explicit event trajectories or large-scale datasets. In contrast, our method does not require any such processing or data, and can be readily applied to incoming event streams. We show the effectiveness of our approach for various types of non-rigid objects and compare to existing methods for non-rigid 3D tracking. In our experiments, the proposed energy-based formulations outperform competing RGB-based methods in terms of 3D errors. The source code and the new data are publicly available1.

[1]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[2]  Stefan Leutenegger,et al.  Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera , 2016, ECCV.

[3]  Davide Scaramuzza,et al.  ESIM: an Open Event Camera Simulator , 2018, CoRL.

[4]  Christian Theobalt,et al.  HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization , 2020, ECCV.

[5]  Rui Yu,et al.  Direct, Dense, and Deformable: Template-Based Non-rigid 3D Reconstruction from RGB Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Tobi Delbrück,et al.  DHP19: Dynamic Vision Sensor 3D Human Pose Dataset , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Jitendra Malik,et al.  Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction , 2014, NIPS.

[8]  Jianfei Cai,et al.  3D Hand Shape and Pose Estimation from a Single RGB Image (Supplementary Material) , 2019 .

[9]  Davide Scaramuzza,et al.  Event-based, Direct Camera Tracking from a Photometric 3D Map using Nonlinear Optimization , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[10]  Pascal Fua,et al.  Laplacian Meshes for Monocular 3D Shape Recovery , 2012, ECCV.

[11]  Chiara Bartolozzi,et al.  Event-Based Vision: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Larry S. Davis,et al.  Structure of Applicable Surfaces from Single Views , 2004, ECCV.

[13]  Pascal Fua,et al.  Surface Deformation Models for Nonrigid 3D Shape Recovery , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Aaron Hertzmann,et al.  Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Mohamed A. Elgharib,et al.  EventHands: Real-Time Neural 3D Hand Reconstruction from an Event Stream , 2020, ArXiv.

[17]  Antonis A. Argyros,et al.  Patch-Based Reconstruction of a Textureless Deformable 3D Surface from a Single RGB Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[18]  Adrien Bartoli,et al.  Monocular Template-based Reconstruction of Inextensible Surfaces , 2011, International Journal of Computer Vision.

[19]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[20]  Horst Bischof,et al.  Event-driven stereo matching for real-time 3D panoramic vision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Christian Theobalt,et al.  EventCap: Monocular 3D Capture of High-Speed Human Motions Using an Event Camera , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Thomas Pock,et al.  Real-time panoramic tracking for event cameras , 2017, 2017 IEEE International Conference on Computational Photography (ICCP).

[23]  Didier Stricker,et al.  IsMo-GAN: Adversarial Learning for Monocular Non-Rigid 3D Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Dimitrios Tzionas,et al.  Embodied Hands: Modeling and Capturing Hands and Bodies Together , 2022, ArXiv.

[25]  Didier Stricker,et al.  Consolidating Segmentwise Non-Rigid Structure from Motion , 2019, 2019 16th International Conference on Machine Vision Applications (MVA).

[26]  Pascal Fua,et al.  Dense Image Registration and Deformable Surface Reconstruction in Presence of Occlusions and Minimal Texture , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Davide Scaramuzza,et al.  EMVS: Event-Based Multi-View Stereo—3D Reconstruction with an Event Camera in Real-Time , 2017, International Journal of Computer Vision.

[28]  Christian Theobalt,et al.  Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints , 2020, ECCV.

[29]  Anoop Cherian,et al.  Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.