Auto4D: Learning to Label 4D Objects from Sequential Point Clouds

In the past few years we have seen great advances in 3D object detection thanks to deep learning methods. However, they typically rely on large amounts of high-quality labels to achieve good performance, which often require timeconsuming and expensive work by human annotators. To address this we propose an automatic annotation pipeline that generates accurate object trajectories in 3D (i.e., 4D labels) from LiDAR point clouds. Different from previous works that consider single frames at a time, our approach directly operates on sequential point clouds to combine richer object observations. The key idea is to decompose the 4D label into two parts: the 3D size of the object, and its motion path describing the evolution of the object’s pose through time. More specifically, given a noisy but easy-toget object track as initialization, our model first estimates the object size from temporally aggregated observations, and then refines its motion path by considering both framewise observations as well as temporal motion cues. We validate the proposed method on a large-scale driving dataset and show that our approach achieves significant improvements over the baselines. We also showcase the benefits of our approach under the annotator-in-the-loop setting.

[1]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Jianren Wang,et al.  3D Multi-Object Tracking: A Baseline and New Evaluation Metrics , 2019 .

[4]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[5]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[6]  Bin Yang,et al.  HDNET: Exploiting HD Maps for 3D Object Detection , 2018, CoRL.

[7]  Kurt Keutzer,et al.  LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[8]  Sergio Casas,et al.  PnPNet: End-to-End Perception and Prediction With Tracking in the Loop , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Sergio Casas,et al.  StrObe: Streaming Object Detection from LiDAR Packets , 2020, CoRL.

[10]  Dushyant Rao,et al.  Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Steven L. Waslander,et al.  Leveraging Temporal Data for Automatic Labelling of Static Vehicles , 2020, 2020 17th Conference on Computer and Robot Vision (CRV).

[12]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Sanja Fidler,et al.  Fast Interactive Object Annotation With Curve-GCN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Weijing Shi,et al.  Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Carlos Vallespi-Gonzalez,et al.  LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Tian Xia,et al.  Vehicle Detection from 3D Lidar Using Fully Convolutional Network , 2016, Robotics: Science and Systems.

[17]  Adrien Gaidon,et al.  Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xiaoyong Shen,et al.  STD: Sparse-to-Dense 3D Object Detector for Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Zhidong Deng,et al.  Fully Motion-Aware Network for Video Object Detection , 2018, ECCV.

[20]  Andrew Zisserman,et al.  Detect to Track and Track to Detect , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Sergio Casas,et al.  Implicit Latent Variable Model for Scene-Consistent Motion Forecasting , 2020, ECCV.

[22]  Trevor Darrell,et al.  Joint Monocular 3D Vehicle Detection and Tracking , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yujie Wang,et al.  Flow-Guided Feature Aggregation for Video Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Sanja Fidler,et al.  Annotating Object Instances with a Polygon-RNN , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Bin Yang,et al.  PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Yanan Sun,et al.  3DSSD: Point-Based 3D Single Stage Object Detector , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[30]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Lei Zhang,et al.  Structure Aware Single-Stage 3D Object Detection From Point Cloud , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bin Yang,et al.  Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Bo Li,et al.  3D fully convolutional network for vehicle detection in point cloud , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Sergio Casas,et al.  IntentNet: Learning to Predict Intention from Raw Sensor Data , 2018, CoRL.

[35]  Konrad Schindler,et al.  Continuous Energy Minimization for Multitarget Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[37]  Steven L. Waslander,et al.  Leveraging Pre-Trained 3D Object Detection Models for Fast Ground Truth Generation , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[38]  Luc Van Gool,et al.  Weakly Supervised 3D Object Detection from Lidar Point Cloud , 2020, ECCV.

[39]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.