Online Adaptation for Implicit Object Tracking and Shape Reconstruction in the Wild

Tracking and reconstructing 3D objects from cluttered scenes are the key components for computer vision, robotics and autonomous driving systems. While recent progress in implicit function (e.g., DeepSDF) has shown encouraging results on high-quality 3D shape reconstruction, it is still very challenging to generalize to cluttered and partially observable LiDAR data. In this paper, we propose to leverage the continuity in video data. We introduce a novel and unified framework which utilizes a DeepSDF model to simultaneously track and reconstruct 3D objects in the wild. We online adapt the DeepSDF model in the video, iteratively improving the shape reconstruction while in return improving the tracking, and vice versa. We experiment with both Waymo and KITTI datasets, and show significant improvements over state-of-the-art methods for both tracking and shape reconstruction.

[1]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[2]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jianren Wang,et al.  3D Multi-Object Tracking: A Baseline and New Evaluation Metrics , 2019 .

[4]  Zheng Fang,et al.  3D-SiamRPN: An End-to-End Learning Method for Real-Time 3D Single Object Tracking Using Raw Point Cloud , 2021, IEEE Sensors Journal.

[5]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Shuguang Cui,et al.  Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Jörg Stückler,et al.  Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors , 2016, GCPR.

[8]  Bin Yang,et al.  Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Feng Zhao,et al.  P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Bernard Ghanem,et al.  Efficient Bird Eye View Proposals for 3D Siamese Tracking , 2019, 1903.10168.

[13]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  James M. Rehg,et al.  3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Yue Meng,et al.  Localization and Mapping using Instance-specific Mesh Models , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Yaser Sheikh,et al.  Modeling Facial Geometry Using Compositional VAEs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Gordon Wetzstein,et al.  Inferring Semantic Information with 3D Neural Scene Representations , 2020, ArXiv.

[18]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[20]  Silvio Savarese,et al.  JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  Krzysztof Czarnecki,et al.  FANTrack: 3D Multi-Object Tracking with Feature Association Network , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[22]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Naiyan Wang,et al.  Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Adrien Gaidon,et al.  Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[27]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[28]  Yaron Lipman,et al.  Implicit Geometric Regularization for Learning Shapes , 2020, ICML.

[29]  Chi-Keung Tang,et al.  GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision , 2020, ECCV.

[30]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Dong Tian,et al.  FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds , 2017, ArXiv.

[32]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Konrad Schindler,et al.  Are Cars Just 3D Boxes? Jointly Estimating the 3D Shape of Multiple Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Hanbyul Joo,et al.  PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Thomas Funkhouser,et al.  Local Deep Implicit Functions for 3D Shape , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[39]  Wengang Zhou,et al.  Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection , 2020, AAAI.

[40]  Gordon Wetzstein,et al.  Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[41]  Zihao Wang,et al.  Secrets of 3D Implicit Object Shape Reconstruction in the Wild , 2021, ArXiv.

[42]  Trevor Darrell,et al.  Joint Monocular 3D Vehicle Detection and Tracking , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Horst-Michael Groß,et al.  Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[45]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[46]  Duygu Ceylan,et al.  DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[47]  Thierry Chateau,et al.  Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Alex Trevithick,et al.  GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering , 2020, ArXiv.

[49]  Ruigang Yang,et al.  ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[51]  Eddy Ilg,et al.  Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[52]  Thomas Funkhouser,et al.  Local Implicit Grid Representations for 3D Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Bernard Ghanem,et al.  Leveraging Shape Completion for 3D Siamese Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).