Topology Reasoning for Driving Scenes

Understanding the road genome is essential to realize autonomous driving. This highly intelligent problem contains two aspects - the connection relationship of lanes, and the assignment relationship between lanes and traffic elements, where a comprehensive topology reasoning method is vacant. On one hand, previous map learning techniques struggle in deriving lane connectivity with segmentation or laneline paradigms; or prior lane topology-oriented approaches focus on centerline detection and neglect the interaction modeling. On the other hand, the traffic element to lane assignment problem is limited in the image domain, leaving how to construct the correspondence from two views an unexplored challenge. To address these issues, we present TopoNet, the first end-to-end framework capable of abstracting traffic knowledge beyond conventional perception tasks. To capture the driving scene topology, we introduce three key designs: (1) an embedding module to incorporate semantic knowledge from 2D elements into a unified feature space; (2) a curated scene graph neural network to model relationships and enable feature interaction inside the network; (3) instead of transmitting messages arbitrarily, a scene knowledge graph is devised to differentiate prior knowledge from various types of the road genome. We evaluate TopoNet on the challenging scene understanding benchmark, OpenLane-V2, where our approach outperforms all previous works by a great margin on all perceptual and topological metrics. The code would be released soon.

[1]  Shaoshuai Shi,et al.  Sparse Dense Fusion for 3D Object Detection , 2023, 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Abhinav Valada,et al.  Learning and Aggregating Lane Graphs for Urban Automated Driving , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  François Rameau,et al.  InstaGraM: Instance-level Graph Modeling for Vectorized HD Map Learning , 2023, ArXiv.

[4]  Xiaojiang Chen,et al.  A Comprehensive Survey of Scene Graphs: Generation and Application , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jifeng Dai,et al.  Planning-oriented Autonomous Driving , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jifeng Dai,et al.  BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jianwu Fang,et al.  Heterogeneous Trajectory Forecasting via Risk and Scene Graph Learning , 2022, ArXiv.

[8]  Chang Huang,et al.  MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction , 2022, ICLR.

[9]  Junchi Yan,et al.  ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning , 2022, ECCV.

[10]  Jin Gao,et al.  PolarFormer: Multi-camera 3D Object Detection with Polar Transformers , 2022, AAAI.

[11]  Yilun Wang,et al.  VectorMapNet: End-to-end Vectorized HD Map Learning , 2022, ArXiv.

[12]  Huizi Mao,et al.  BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Philipp Krahenbuhl,et al.  Cross-view Transformers for real-time Map-view Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Junchi Yan,et al.  HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding , 2022, ArXiv.

[15]  S. Fidler,et al.  M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation , 2022, ArXiv.

[16]  Jifeng Dai,et al.  BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers , 2022, ECCV.

[17]  Junchi Yan,et al.  PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark , 2022, ECCV.

[18]  Harinarayanan Balakrishnan,et al.  Lane-Level Street Map Extraction from Aerial Imagery , 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[19]  L. Gool,et al.  Topology Preserving Local Road Network Estimation from Single Onboard Camera Image , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  S. Bai,et al.  SeqFormer: Sequential Transformer for Video Instance Segmentation , 2021, ECCV.

[21]  J. M. Zöllner,et al.  Towards Traffic Scene Description: The Semantic Scene Graph , 2021, 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC).

[22]  Deepan Muthirayan,et al.  Spatiotemporal Scene-Graph Embedding for Autonomous Vehicle Collision Prediction , 2021, IEEE Internet of Things Journal.

[23]  Vishal M. Patel,et al.  SPIN Road Mapper: Extracting Roads from Aerial Images via Spatial and Interaction Space Graph Reasoning for Autonomous Driving , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[24]  Mohammad Abdullah Al Faruque,et al.  roadscene2vec: A Tool for Extracting and Embedding Road Scene-Graphs , 2021, Knowl. Based Syst..

[25]  Yilun Wang,et al.  HDMapNet: An Online HD Map Construction and Evaluation Framework , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[26]  L. Gool,et al.  Understanding Bird’s-Eye View of Road Semantics Using an Onboard Camera , 2020, IEEE Robotics and Automation Letters.

[27]  Deepan Muthirayan,et al.  Scene-Graph Augmented Data-Driven Risk Assessment of Autonomous Vehicle Decisions , 2020, IEEE Transactions on Intelligent Transportation Systems.

[28]  Junchi Yan,et al.  Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach , 2022, CoRL.

[29]  Yuxuan Liu,et al.  CenterLineDet: Road Lane CenterLine Graph Detection With Vehicle-Mounted Sensors by Transformer for High-definition Map Creation , 2022, ArXiv.

[30]  James Hays,et al.  Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting , 2023, NeurIPS Datasets and Benchmarks.

[31]  Luc Van Gool,et al.  Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Mohana,et al.  Graph Neural Network (GNN) in Image and Video Understanding Using Deep Learning for Computer Vision Applications , 2021, 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC).

[33]  Wolfram Burgard,et al.  Lane Graph Estimation for Scene Understanding in Urban Driving , 2021, IEEE Robotics and Automation Letters.

[34]  Zhiwei Guo,et al.  A Deep Graph Neural Network-Based Mechanism for Social Recommendations , 2021, IEEE Transactions on Industrial Informatics.

[35]  Kris Kitani,et al.  PTP: Parallelized Tracking and Prediction With Graph Neural Networks and Diversity Sampling , 2021, IEEE Robotics and Automation Letters.

[36]  Thiago Oliveira-Santos,et al.  Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[38]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Ruifeng Li,et al.  Road Scene Graph: A Semantic Graph-Based Scene Representation Dataset for Intelligent Vehicles , 2020, ArXiv.

[40]  Sanja Fidler,et al.  Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D , 2020, ECCV.

[41]  R. Urtasun,et al.  Learning Lane Graph Representations for Motion Forecasting , 2020, ECCV.

[42]  Hari Balakrishnan,et al.  Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding , 2020, ECCV.

[43]  Kris Kitani,et al.  GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[45]  K. M. Krishna,et al.  Understanding Dynamic Scenes using Graph Convolution Networks , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46]  Dragomir Anguelov,et al.  VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Roberto Cipolla,et al.  Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  T. Choe,et al.  Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection , 2020, ECCV.

[49]  Abduallah A. Mohamed,et al.  Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  K. M. Krishna,et al.  Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[51]  Stanley H. Chan,et al.  Learning 3D-aware Egocentric Spatial-Temporal Interaction via Graph Convolutional Networks , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Bolei Zhou,et al.  Cross-View Semantic Segmentation for Sensing Surroundings , 2019, IEEE Robotics and Automation Letters.

[53]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[56]  Raquel Urtasun,et al.  DAGMapper: Learning to Map by Discovering Lane Topology , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[58]  Florentin Wörgötter,et al.  Deep Metadata Fusion for Traffic Light to Lane Assignment , 2019, IEEE Robotics and Automation Letters.

[59]  Mayank Bansal,et al.  ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[60]  Dan Levi,et al.  3D-LaneNet: End-to-End 3D Multiple Lane Detection , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[62]  Xiaogang Wang,et al.  Spatial As Deep: Spatial CNN for Traffic Scene Understanding , 2017, AAAI.

[63]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[64]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[65]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[66]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[69]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[73]  H. Mannila,et al.  Computing Discrete Fréchet Distance ∗ , 1994 .