Online Lane Graph Extraction from Onboard Video

Autonomous driving requires a structured understanding of the surrounding road network to navigate. One of the most common and useful representation of such an understanding is done in the form of BEV lane graphs. In this work, we use the video stream from an onboard camera for online extraction of the surrounding's lane graph. Using video, instead of a single image, as input poses both benefits and challenges in terms of combining the information from different timesteps. We study the emerged challenges using three different approaches. The first approach is a post-processing step that is capable of merging single frame lane graph estimates into a unified lane graph. The second approach uses the spatialtemporal embeddings in the transformer to enable the network to discover the best temporal aggregation strategy. Finally, the third, and the proposed method, is an early temporal aggregation through explicit BEV projection and alignment of framewise features. A single model of this proposed simple, yet effective, method can process any number of images, including one, to produce accurate lane graphs. The experiments on the Nuscenes and Argoverse datasets show the validity of all the approaches while highlighting the superiority of the proposed method. The code will be made public.

[1]  Philipp Krahenbuhl,et al.  Cross-view Transformers for real-time Map-view Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  L. Gool,et al.  Topology Preserving Local Road Network Estimation from Single Onboard Camera Image , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yilun Wang,et al.  HDMapNet: An Online HD Map Construction and Evaluation Framework , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[4]  Moongu Jeon,et al.  Key Points Estimation and Point Instance Segmentation Approach for Lane Detection , 2020, ArXiv.

[5]  Luc Van Gool,et al.  Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Yanjun Qi,et al.  Long-Range Transformers for Dynamic Spatiotemporal Forecasting , 2021, ArXiv.

[7]  Luc Van Gool,et al.  Decoder Fusion RNN: Context and Interaction Aware Decoders for Trajectory Prediction , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Parham Aarabi,et al.  SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Raquel Urtasun,et al.  MP3: A Unified Model to Map, Perceive, Predict and Plan , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Henrik I. Christensen,et al.  TridentNet: A Conditional Generative Model for Dynamic Trajectory Generation , 2021, IAS.

[11]  Luc Van Gool,et al.  Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard Monocular Camera , 2020, ArXiv.

[12]  Shaul Oron,et al.  3D-LaneNet+: Anchor Free Lane Detection using a Semi-Local Representation , 2020, ArXiv.

[13]  Sanja Fidler,et al.  Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D , 2020, ECCV.

[14]  Maximilian Jaritz,et al.  2D-3D scene understanding for autonomous driving , 2020 .

[15]  Pengfei Duan,et al.  FISHING Net: Future Inference of Semantic Heatmaps In Grids , 2020, ArXiv.

[16]  Hengyuan Zhang,et al.  Probabilistic Semantic Mapping for Urban Autonomous Driving Applications , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[18]  Shuai Yi,et al.  Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction , 2020, ECCV.

[19]  Luc Van Gool,et al.  Action Sequence Predictions of Vehicles in Urban Environments using Map and Social Context , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Roberto Cipolla,et al.  Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Dimitris N. Metaxas,et al.  MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Bolei Zhou,et al.  Cross-View Semantic Segmentation for Sensing Surroundings , 2019, IEEE Robotics and Automation Letters.

[23]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Vladlen Koltun,et al.  Learning by Cheating , 2019, CoRL.

[25]  Raquel Urtasun,et al.  DAGMapper: Learning to Map by Discovering Lane Topology , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Raquel Urtasun,et al.  Exploiting Sparse Semantic HD Maps for Self-Driving Vehicle Localization , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  C. V. Jawahar,et al.  Improved Road Connectivity by Joint Learning of Orientation and Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Raquel Urtasun,et al.  Convolutional Recurrent Network for Road Boundary Extraction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Benjamin Sapp,et al.  Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Chun Liu,et al.  Leveraging Crowdsourced GPS Data for Road Extraction From Aerial Imagery , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Luc Van Gool,et al.  End-to-end Lane Detection through Differentiable Least-Squares Fitting , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[33]  Mayank Bansal,et al.  ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[34]  Dan Levi,et al.  3D-LaneNet: End-to-End 3D Multiple Lane Detection , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Henggang Cui,et al.  Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[36]  Bin Yang,et al.  HDNET: Exploiting HD Maps for 3D Object Detection , 2018, CoRL.

[37]  Sergio Casas,et al.  IntentNet: Learning to Predict Intention from Raw Sensor Data , 2018, CoRL.

[38]  Raquel Urtasun,et al.  End-to-End Deep Structured Models for Drawing Crosswalks , 2018, ECCV.

[39]  Victor Talpaert,et al.  Real-time Dynamic Object Detection for Autonomous Driving using Prior 3D-Maps , 2018, ECCV Workshops.

[40]  Luc Van Gool,et al.  Iterative Deep Learning for Road Topology Extraction , 2018, BMVC.

[41]  Raquel Urtasun,et al.  Hierarchical Recurrent Attention Networks for Structured Online Maps , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Klaus Werner Schmidt,et al.  A lane detection algorithm based on reliable lane markings , 2018, 2018 26th Signal Processing and Communications Applications Conference (SIU).

[43]  Sanja Fidler,et al.  Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[45]  Xiaolong Hu,et al.  Autonomous Driving in the iCity—HD Maps as a Key Challenge of the Automotive Industry , 2016 .

[46]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Costas Armenakis,et al.  Survey of Work on Road Extraction in Aerial and Satellite Images , 2002 .

[48]  John A. Richards,et al.  Remote Sensing Digital Image Analysis , 1986 .