HDMapNet: A Local Semantic Map Learning and Evaluation Framework

Estimating local semantics from sensory inputs is a central component for high-definition map constructions in autonomous driving. However, traditional pipelines require a vast amount of human efforts and resources in annotating and maintaining the semantics in the map, which limits its scalability. In this paper, we introduce the problem of local semantic map learning, which dynamically constructs the vectorized semantics based on onboard sensor observations. Meanwhile, we introduce a local semantic map learning method, dubbed HDMapNet. HDMapNet encodes image features from surrounding cameras and/or point clouds from LiDAR, and predicts vectorized map elements in the bird’s-eye view. We benchmark HDMapNet on nuScenes dataset and show that in all settings, it performs better than baseline methods. Of note, our fusion-based HDMapNet outperforms existing methods by more than 50% in all metrics. In addition, we develop semanticlevel and instance-level metrics to evaluate the map learning performance. Finally, we showcase our method is capable of predicting a locally consistent map. By introducing the method and metrics, we invite the community to study this novel map learning problem. Code and evaluation kit will be released to facilitate future development.

[1]  Bolei Zhou,et al.  Cross-View Semantic Segmentation for Sensing Surroundings , 2019, IEEE Robotics and Automation Letters.

[2]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[3]  Christoph Stiller,et al.  Kalman Particle Filter for lane recognition on rural roads , 2009, 2009 IEEE Intelligent Vehicles Symposium.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Zhidong Deng,et al.  Segmentation of Drivable Road Using Deep Fully Convolutional Residual Network with Pyramid Pooling , 2018, Cognitive Computation.

[6]  Aleksandr V. Segal,et al.  Generalized-ICP , 2009, Robotics: Science and Systems.

[7]  Min Bai,et al.  Deep Multi-Sensor Lane Detection , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ming Yang,et al.  Restricted Deformable Convolution-Based Road Scene Semantic Segmentation Using Surround View Cameras , 2018, IEEE Transactions on Intelligent Transportation Systems.

[10]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sheng-Fuu Lin,et al.  Lane detection using color-based segmentation , 2005, IEEE Proceedings. Intelligent Vehicles Symposium, 2005..

[12]  Luc Van Gool,et al.  Segmentation-Based Urban Traffic Scene Understanding , 2009, BMVC.

[13]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Peter Biber,et al.  The normal distributions transform: a new approach to laser scan matching , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[16]  Jialin Jiao,et al.  Machine Learning Assisted High-Definition Map Creation , 2018, 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC).

[17]  Weiqiang Ren,et al.  LaneNet: Real-Time Lane Detection Networks for Autonomous Driving , 2018, ArXiv.

[18]  Luc Van Gool,et al.  Towards End-to-End Lane Detection: an Instance Segmentation Approach , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[19]  Yin Zhou,et al.  End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds , 2019, CoRL.

[20]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Tae Eun Choe,et al.  Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection , 2020, ECCV.

[22]  Dan Levi,et al.  3D-LaneNet: End-to-End 3D Multiple Lane Detection , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Luc Van Gool,et al.  Semantic Instance Segmentation for Autonomous Driving , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Timo Sämann,et al.  Efficient Semantic Segmentation for Visual Bird's-eye View Interpretation , 2018, IAS.

[26]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[27]  Jianxiong Xiao,et al.  Semantic alignment of LiDAR data at city scale , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[29]  Yann LeCun,et al.  Road Scene Segmentation from a Single Image , 2012, ECCV.

[30]  Bidyut Baran Chaudhuri,et al.  A survey of Hough Transform , 2015, Pattern Recognit..

[31]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  F. Dellaert Factor Graphs and GTSAM: A Hands-on Introduction , 2012 .

[35]  Roland Siegwart,et al.  A Review of Point Cloud Registration Algorithms for Mobile Robotics , 2015, Found. Trends Robotics.

[36]  Lu Feng,et al.  A robust pose graph approach for city scale LiDAR mapping , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Sanja Fidler,et al.  Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D , 2020, ECCV.

[38]  Suyash P. Awate,et al.  Computer Vision, Graphics, and Image Processing , 2016, Lecture Notes in Computer Science.

[39]  Roberto Cipolla,et al.  Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Junqiang Xi,et al.  A novel lane detection based on geometrical model and Gabor filter , 2010, 2010 IEEE Intelligent Vehicles Symposium.