论文信息 - Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard Monocular Camera

Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard Monocular Camera

Autonomous navigation requires scene understanding of the action-space to move or anticipate events. For planner agents moving on the ground plane, such as autonomous vehicles, this translates to scene understanding in the bird's-eye view. However, the onboard cameras of autonomous cars are customarily mounted horizontally for a better view of the surrounding. In this work, we study scene understanding in the form of online estimation of semantic bird's-eye-view HD-maps using the video input from a single onboard camera. We study three key aspects of this task, image-level understanding, BEV level understanding, and the aggregation of temporal information. Based on these three pillars we propose a novel architecture that combines these three aspects. In our extensive experiments, we demonstrate that the considered aspects are complementary to each other for HD-map understanding. Furthermore, the proposed architecture significantly surpasses the current state-of-the-art.

[1] Thomas S. Huang,et al. Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Luc Van Gool,et al. Learning Where to Classify in Multi-view Semantic Segmentation , 2014, ECCV.

[3] Vladlen Koltun,et al. Learning by Cheating , 2019, CoRL.

[4] Seung-Ho Lee,et al. Novel Method of Semantic Segmentation Applicable to Augmented Reality , 2020, Sensors.

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] James R. Bergen,et al. Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7] Benjamin Sapp,et al. Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Sergio Casas,et al. IntentNet: Learning to Predict Intention from Raw Sensor Data , 2018, CoRL.

[9] Lutz Eckstein,et al. A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[10] Hugh F. Durrant-Whyte,et al. Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[11] Bolei Zhou,et al. Cross-View Semantic Segmentation for Sensing Surroundings , 2019, IEEE Robotics and Automation Letters.

[12] Henggang Cui,et al. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[13] Sanja Fidler,et al. Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D , 2020, ECCV.

[14] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Muhammad Sualeh,et al. Simultaneous Localization and Mapping in the Epoch of Semantics: A Survey , 2018, International Journal of Control, Automation and Systems.

[16] Pengfei Duan,et al. FISHING Net: Future Inference of Semantic Heatmaps In Grids , 2020, ArXiv.

[17] Nathan Jacobs,et al. Learning to Look around Objects for Top-View Representations of Outdoor Scenes , 2018, ECCV.

[18] Dimitris N. Metaxas,et al. MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Andreas Geiger,et al. Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art , 2017, Found. Trends Comput. Graph. Vis..

[20] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[21] Stefano Mattoccia,et al. Distilled Semantics for Comprehensive Scene Understanding from Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Ken Sakurada,et al. OpenVSLAM: A Versatile Visual SLAM Framework , 2019, ACM Multimedia.

[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Maximilian Jaritz,et al. 2D-3D scene understanding for autonomous driving , 2020 .

[25] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Jason Yosinski,et al. An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[27] Mayank Bansal,et al. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[28] Luc Van Gool,et al. Action Sequence Predictions of Vehicles in Urban Environments using Map and Social Context , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29] Euntai Kim,et al. Road Lane Semantic Segmentation for High Definition Map , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[30] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Chenyang Lu,et al. Monocular Semantic Occupancy Grid Mapping With Convolutional Variational Encoder–Decoder Networks , 2018, IEEE Robotics and Automation Letters.

[32] Qiang Xu,et al. nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Qingming Huang,et al. Spatiotemporal CNN for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Hengyuan Zhang,et al. Probabilistic Semantic Mapping for Urban Autonomous Driving Applications , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35] Roberto Cipolla,et al. Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Bin Yang,et al. HDNET: Exploiting HD Maps for 3D Object Detection , 2018, CoRL.

[37] Simon Lucey,et al. Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).