论文信息 - Leveraging Shape Completion for 3D Siamese Tracking

Leveraging Shape Completion for 3D Siamese Tracking

Point clouds are challenging to process due to their sparsity, therefore autonomous vehicles rely more on appearance attributes than pure geometric features. However, 3D LIDAR perception can provide crucial information for urban navigation in challenging light or weather conditions. In this paper, we investigate the versatility of Shape Completion for 3D Object Tracking in LIDAR point clouds. We design a Siamese tracker that encodes model and candidate shapes into a compact latent representation. We regularize the encoding by enforcing the latent representation to decode into an object model shape. We observe that 3D object tracking and 3D shape completion complement each other. Learning a more meaningful latent representation shows better discriminatory capabilities, leading to improved tracking performance. We test our method on the KITTI Tracking set using car 3D bounding boxes. Our model reaches a 76.94% Success rate and 81.38% Precision for 3D Object Tracking, with the shape completion regularization leading to an improvement of 3% in both metrics.

[1] Sanja Fidler,et al. SurfConv: Bridging 3D and 2D Convolution for RGBD Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Remco C. Veltkamp,et al. Content Based 3 D Shape Retrieval , 2005 .

[3] Jianxiong Xiao,et al. Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[4] Silvio Savarese,et al. Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Ilya Kostrikov,et al. Surface Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Matthias Nießner,et al. Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Huchuan Lu,et al. Structured Siamese Network for Real-Time Visual Tracking , 2018, ECCV.

[8] Majid Mirmehdi,et al. Real-Time Detection and Recognition of Road Traffic Signs , 2012, IEEE Transactions on Intelligent Transportation Systems.

[9] Bernard Ghanem,et al. Context-Aware Correlation Filter Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Bin Yang,et al. Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Ruigang Yang,et al. Learning Depth with Convolutional Spatial Propagation Network , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Leonidas J. Guibas,et al. The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[13] Ji Wan,et al. Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jens Schneider,et al. Integration of Absolute Orientation Measurements in the KinectFusion Reconstruction Pipeline , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15] Anil K. Jain,et al. On-line signature verification, , 2002, Pattern Recognit..

[16] Nobuyuki Umetani,et al. Exploring generative 3D shapes using autoencoder networks , 2017, SIGGRAPH Asia Technical Briefs.

[17] Changsheng Xu,et al. Structural Sparse Tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Leonidas J. Guibas,et al. Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Steven Lake Waslander,et al. Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20] K. Madhava Krishna,et al. Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21] Roland Siegwart,et al. A Layered Approach to People Detection in 3D Range Data , 2010, AAAI.

[22] Branko Ristic,et al. Beyond the Kalman Filter: Particle Filters for Tracking Applications , 2004 .

[23] Jiri Matas,et al. A Novel Performance Evaluation Methodology for Single-Target Trackers , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Bin Yang,et al. PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25] Jörg Stückler,et al. Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors , 2016, GCPR.

[26] Jian Yang,et al. Deep hierarchical guidance and regularization learning for end-to-end depth estimation , 2018, Pattern Recognit..

[27] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[28] Jianbing Shen,et al. Triplet Loss in Siamese Network for Object Tracking , 2018, ECCV.

[29] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31] Remco C. Veltkamp,et al. A Survey of Content Based 3D Shape Retrieval Methods , 2004, SMI.

[32] Bohyung Han,et al. Real-Time MDNet , 2018, ECCV.

[33] Longin Jan Latecki,et al. Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Tianzhu Zhang,et al. 3D Part-Based Sparse Tracker with Automatic Synchronization and Registration , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Sebastian Thrun,et al. Self-supervised Monocular Road Detection in Desert Terrain , 2006, Robotics: Science and Systems.

[36] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[37] Bernard Ghanem,et al. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[38] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Lennart Svensson,et al. LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks , 2018, Robotics Auton. Syst..

[40] Luca Bertinetto,et al. Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[41] David Hsu,et al. Particle Filter Networks with Application to Visual Localization , 2018, CoRL.

[42] Junliang Xing,et al. Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] V. Matousek,et al. Signature verification using ART-2 neural network , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[44] Wei Wu,et al. High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Luc Van Gool,et al. Multi-view traffic sign detection, recognition, and 3D localisation , 2014, 2009 Workshop on Applications of Computer Vision (WACV).

[47] Stefan Roth,et al. MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[48] Chong Luo,et al. A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[51] Binh-Son Hua,et al. Pointwise Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52] Sinisa Todorovic,et al. Boundary Flow: A Siamese Network that Predicts Boundary Motion Without Training on Motion , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53] Kai Oliver Arras,et al. People tracking in RGB-D data with on-line boosted target models , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[54] Fawzi Nashashibi,et al. Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[55] Luca Bertinetto,et al. Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Zhidong Deng,et al. SegStereo: Exploiting Semantic Information for Disparity Estimation , 2018, ECCV.

[57] Sertac Karaman,et al. Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[58] Thomas Brox,et al. Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[59] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[60] Subhransu Maji,et al. SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[62] James M. Rehg,et al. 3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63] Wei Wu,et al. Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[64] Xiao-Yuan Jing,et al. Context-Aware Three-Dimensional Mean-Shift With Occlusion Handling for Robust Object Tracking in RGB-D Videos , 2019, IEEE Transactions on Multimedia.

[65] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[66] Leonidas J. Guibas,et al. Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[67] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68] Andreas Geiger,et al. Learning 3D Shape Completion Under Weak Supervision , 2018, International Journal of Computer Vision.

[69] Vladlen Koltun,et al. Tangent Convolutions for Dense Prediction in 3D , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.