A Lightweight and Detector-Free 3D Single Object Tracker on Point Clouds

Recent works on 3D single object tracking treat the task as a target-specific 3D detection task, where an off-the-shelf 3D detector is commonly employed for the tracking. However, it is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete. In this paper, we address this issue by explicitly leveraging temporal motion cues and propose DMT, a Detector-free Motion-prediction-based 3D Tracking network that completely removes the usage of complicated 3D detectors and is lighter, faster, and more accurate than previous trackers. Specifically, the motion prediction module is first introduced to estimate a potential target center of the current frame in a point-cloud-free manner. Then, an explicit voting module is proposed to directly regress the 3D box from the estimated target center. Extensive experiments on KITTI and NuScenes datasets demonstrate that our DMT can still achieve better performance ( $\sim $ 10% improvement over the NuScenes dataset) and a faster tracking speed (i.e., 72 FPS) than state-of-the-art approaches without applying any complicated 3D detectors. Our code is released at https://github.com/jimmy-dq/DMT.

[1]  Zheng Fang,et al.  Real-Time 3D Single Object Tracking With Transformer , 2022, IEEE Transactions on Multimedia.

[2]  Stefan Leutenegger,et al.  Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  M. Sundermeyer,et al.  Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Alin Albu-Schaffer,et al.  SRT3D: A Sparse Region-Based 3D Object Tracking Approach for the Real World , 2021, International Journal of Computer Vision.

[5]  Victor Talpaert,et al.  Deep Reinforcement Learning for Autonomous Driving: A Survey , 2020, IEEE Transactions on Intelligent Transportation Systems.

[6]  Jun Wang,et al.  MLVSNet: Multi-level Voting Siamese Network for 3D Visual Tracking , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  H. Bao,et al.  You Don’t Only Look Once: Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Z. Fang,et al.  PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Shuguang Cui,et al.  Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Cheng Wang,et al.  Tracklet Proposal Network for Multi-Object Tracking on Point Clouds , 2021, IJCAI.

[11]  Uwe Stilla,et al.  ASFM-Net: Asymmetrical Siamese Feature Matching Network for Point Completion , 2021, ACM Multimedia.

[12]  Zheng Fang,et al.  3D-SiamRPN: An End-to-End Learning Method for Real-Time 3D Single Object Tracking Using Raw Point Cloud , 2021, IEEE Sensors Journal.

[13]  Chanho Kim,et al.  Discriminative Appearance Modeling with Multi-track Pooling for Real-time Multi-object Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Daniel Cremers,et al.  SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud based Place Recognition , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yong Liu,et al.  F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single Object Tracking , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Yuan Liu,et al.  Object Tracking Using Spatio-Temporal Networks for Future Prediction Location , 2020, ECCV.

[17]  Uwe Stilla,et al.  VPC-Net: Completion of 3D Vehicles from MLS Point Clouds , 2020, ISPRS Journal of Photogrammetry and Remote Sensing.

[18]  Jianren Wang,et al.  Motion Prediction in Visual Object Tracking , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Kris Kitani,et al.  GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Feng Zhao,et al.  P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Vladlen Koltun,et al.  Tracking Objects as Points , 2020, ECCV.

[22]  Zheng Fang,et al.  Point Siamese Network for Person Tracking Using 3D Point Clouds , 2019, Sensors.

[23]  David Held,et al.  3D Multi-Object Tracking: A Baseline and New Evaluation Metrics , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Hui Zhou,et al.  Robust Multi-Modality Multi-Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Fahad Shahbaz Khan,et al.  Learning the Model Update for Siamese Trackers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Bernard Ghanem,et al.  Efficient Bird Eye View Proposals for 3D Siamese Tracking , 2019, 1903.10168.

[29]  Bernard Ghanem,et al.  Leveraging Shape Completion for 3D Siamese Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yi-Ting Chen,et al.  The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[31]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jiri Matas,et al.  Object Tracking by Reconstruction With View-Specific Discriminative Correlation Filters , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Trevor Darrell,et al.  Joint Monocular 3D Vehicle Detection and Tracking , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Ling Shao,et al.  Robust and Long-Term Object Tracking With an Application to Vehicles , 2018, IEEE Transactions on Intelligent Transportation Systems.

[36]  Jiri Matas,et al.  How to Make an RGBD Tracker? , 2018, ECCV Workshops.

[37]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Bin Yang,et al.  Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Antoni B. Chan,et al.  Learning Dynamic Memory Networks for Object Tracking , 2018, ECCV.

[40]  Karl Granström,et al.  Mono-Camera 3D Multi-Object Tracking Using Deep Learning Detections and PMBM Filtering , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[41]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[42]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[44]  Bastian Leibe,et al.  Combined image- and world-space tracking in traffic scenes , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Paulo Peixoto,et al.  3D object tracking using RGB and LIDAR data , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[46]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[47]  Tianzhu Zhang,et al.  3D Part-Based Sparse Tracker with Automatic Synchronization and Registration , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  H. Hashimoto,et al.  Human motion tracking of mobile robot with Kinect 3D sensor , 2012, 2012 Proceedings of SICE Annual Conference (SICE).

[51]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Éric Marchand,et al.  Robust model-based tracking for robot vision , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[53]  Branko Ristic,et al.  Beyond the Kalman Filter: Particle Filters for Tracking Applications , 2004 .

[54]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.