Object Tracking by Jointly Exploiting Frame and Event Domain

Inspired by the complementarity between conventional frame-based and bio-inspired event-based cameras, we propose a multi-modal based approach to fuse visual cues from the frameand event-domain to enhance the single object tracking performance, especially in degraded conditions (e.g., scenes with high dynamic range, low light, and fastmotion objects). The proposed approach can effectively and adaptively combine meaningful information from both domains. Our approach’s effectiveness is enforced by a novel designed cross-domain attention schemes, which can effectively enhance features based on selfand cross-domain attention schemes; The adaptiveness is guarded by a specially designed weighting scheme, which can adaptively balance the contribution of the two domains. To exploit event-based visual cues in single-object tracking, we construct a largescale frame-event-based dataset, which we subsequently employ to train a novel frame-event fusion based model. Extensive experiments show that the proposed approach outperforms state-of-the-art frame-based tracking methods by at least 10.4% and 11.9% in terms of representative success rate and precision rate, respectively. Besides, the effectiveness of each key component of our approach is evidenced by our thorough ablation study.

[1]  Jiri Matas,et al.  How to Make an RGBD Tracker? , 2018, ECCV Workshops.

[2]  Luc Van Gool,et al.  Probabilistic Regression for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Zhe,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[4]  Chenglong Li,et al.  Multi-Adapter RGBT Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[5]  Yuning Jiang,et al.  Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[6]  Ying Cui,et al.  SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[8]  Hongkai Wen,et al.  Event-Stream Representation for Human Gaits Identification Using Deep Neural Networks , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Stephan Schraml,et al.  Spatiotemporal multiple persons tracking using Dynamic Vision Sensor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Hanzi Wang,et al.  End-to-end Learning of Object Motion Estimation from Retinal Events for Event-based Object Tracking , 2020, AAAI.

[12]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Eduardo Ros,et al.  Real-Time Clustering and Multi-Target Tracking Using Event-Based Sensors , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Yuqing Gao,et al.  Robust Fusion of Color and Depth Data for RGB-D Target Tracking Using Adaptive Range-Invariant Depth Models and Spatio-Temporal Consistency Constraints , 2018, IEEE Transactions on Cybernetics.

[15]  Davide Scaramuzza,et al.  End-to-End Learning of Representations for Asynchronous Event-Based Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Fahad Shahbaz Khan,et al.  Multi-Modal Fusion for End-to-End RGB-T Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[17]  L. Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Rongrong Ji,et al.  Siamese Box Adaptive Network for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Shengping Zhang,et al.  Robust Collaborative Discriminative Learning for RGB-Infrared Tracking , 2018, AAAI.

[20]  Davide Scaramuzza,et al.  Real-time Visual-Inertial Odometry for Event Cameras using Keyframe-based Nonlinear Optimization , 2017, BMVC.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Davide Scaramuzza,et al.  Event-based Asynchronous Sparse Convolutional Networks , 2020, ECCV.

[23]  Ning An,et al.  Online RGB-D tracking via detection-learning-segmentation , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[24]  Tobi Delbrück,et al.  EV-IMO: Motion Segmentation Dataset and Learning Pipeline for Event Cameras , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Yiannis Aloimonos,et al.  Event-Based Moving Object Detection and Tracking , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Garrick Orchard,et al.  HOTS: A Hierarchy of Event-Based Time-Surfaces for Pattern Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Huchuan Lu,et al.  Visual Tracking via Adaptive Spatially-Regularized Correlation Filters , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Hongjie Liu,et al.  DVS Benchmark Datasets for Object Tracking, Action Recognition, and Object Recognition , 2016, Front. Neurosci..

[29]  Xiao Wang,et al.  Dense Feature Aggregation and Pruning for RGBT Tracking , 2019, ACM Multimedia.

[30]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Bo Dong,et al.  Multi-domain collaborative feature representation for robust visual object tracking , 2021, The Visual Computer.

[32]  Kostas Daniilidis,et al.  Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yan Lu,et al.  Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Tobi Delbruck,et al.  A 240 × 180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision Sensor , 2014, IEEE Journal of Solid-State Circuits.

[35]  Luc Van Gool,et al.  Know Your Surroundings: Exploiting Scene Information for Object Tracking , 2020, ECCV.

[36]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Majid Mirmehdi,et al.  Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling , 2015, BMVC.

[38]  Ling Shao,et al.  CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers , 2020, ECCV.

[39]  Bo Dong,et al.  Depth-Aware Mirror Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ling Zhou,et al.  Cross-Modal Pattern-Propagation for RGB-T Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Huchuan Lu,et al.  GradNet: Gradient-Guided Network for Visual Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Changsheng Xu,et al.  Learning Multi-Task Correlation Particle Filters for Visual Tracking , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Chiara Bartolozzi,et al.  Event-Based Vision: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Narciso García,et al.  Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Yan Huang,et al.  Cross-Modal Ranking with Soft Consistency and Noisy Labels for Robust RGB-T Tracking , 2018, ECCV.

[48]  Zuoxin Li,et al.  SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2020, AAAI.

[49]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.