论文信息 - End-to-end Contextual Perception and Prediction with Interaction Transformer

End-to-end Contextual Perception and Prediction with Interaction Transformer

In this paper, we tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving. Towards this goal, we design a novel approach that explicitly takes into account the interactions between actors. To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer architecture, which we call the Interaction Transformer. Importantly, our model can be trained end-to-end, and runs in real-time. We validate our approach on two challenging real-world datasets: ATG4D and nuScenes. We show that our approach can outperform the state-of-the-art on both datasets. In particular, we significantly improve the social compliance between the estimated future trajectories, resulting in far fewer collisions between the predicted actors.

[1] Yin Zhou,et al. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Mohan M. Trivedi,et al. Convolutional Social Pooling for Vehicle Trajectory Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3] Xiaogang Wang,et al. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Chung Choo Chung,et al. Probabilistic vehicle trajectory prediction over occupancy grid map via recurrent neural network , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[5] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[6] Bin Yang,et al. Deep Continuous Fusion for Multi-sensor 3D Object Detection , 2018, ECCV.

[7] Bin Yang,et al. Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Sergio Casas,et al. End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Silvio Savarese,et al. Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks , 2019, NeurIPS.

[10] Carlos Vallespi-Gonzalez,et al. LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Zhaoxin Li,et al. STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13] Henggang Cui,et al. Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14] Silvio Savarese,et al. SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Henggang Cui,et al. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[16] Renjie Liao,et al. SpAGNN: Spatially-Aware Graph Neural Networks for Relational Behavior Forecasting from Sensor Data , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[17] Leonidas J. Guibas,et al. Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Akansel Cosgun,et al. Towards full automated drive in urban environments: A demonstration in GoMentum Station, California , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[19] Marco Pavone,et al. The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Mayank Bansal,et al. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[21] Silvio Savarese,et al. Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Bin Yang,et al. PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25] Jake Charland,et al. Sensor Fusion for Joint 3D Object Detection and Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26] Anca D. Dragan,et al. Courteous Autonomous Cars , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27] Kris M. Kitani,et al. Forecasting Interactive Dynamics of Pedestrians with Fictitious Play , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Benjin Zhu,et al. Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection , 2019, ArXiv.

[29] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Shaojie Shen,et al. Stereo R-CNN Based 3D Object Detection for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Bin Yang,et al. Multi-Task Multi-Sensor Fusion for 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Ji Wan,et al. Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Xiaogang Wang,et al. Pedestrian Behavior Understanding and Prediction with Deep Neural Networks , 2016, ECCV.

[34] Behzad Dariush,et al. Looking to Relations for Future Trajectory Forecast , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] Bin Yang,et al. HDNET: Exploiting HD Maps for 3D Object Detection , 2018, CoRL.

[36] Dinesh Manocha,et al. TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents , 2018, AAAI.

[37] Sanja Fidler,et al. Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Qiang Xu,et al. nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Huimin Ma,et al. 3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[40] Silvio Savarese,et al. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Sergio Casas,et al. IntentNet: Learning to Predict Intention from Raw Sensor Data , 2018, CoRL.

[42] Dushyant Rao,et al. Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[43] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Julius Ziegler,et al. Making Bertha Drive—An Autonomous Journey on a Historic Route , 2014, IEEE Intelligent Transportation Systems Magazine.

[45] Steven Lake Waslander,et al. Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46] Silvio Savarese,et al. Single-source Attention Path Prediction Multi-source Attention Predicted Observed , 2018 .