Graph-based Topology Reasoning for Driving Scenes

Understanding the road genome is essential to realize autonomous driving. This highly intelligent problem contains two aspects - the connection relationship of lanes, and the assignment relationship between lanes and traffic elements, where a comprehensive topology reasoning method is vacant. On one hand, previous map learning techniques struggle in deriving lane connectivity with segmentation or laneline paradigms; or prior lane topology-oriented approaches focus on centerline detection and neglect the interaction modeling. On the other hand, the traffic element to lane assignment problem is limited in the image domain, leaving how to construct the correspondence from two views an unexplored challenge. To address these issues, we present TopoNet, the first end-to-end framework capable of abstracting traffic knowledge beyond conventional perception tasks. To capture the driving scene topology, we introduce three key designs: (1) an embedding module to incorporate semantic knowledge from 2D elements into a unified feature space; (2) a curated scene graph neural network to model relationships and enable feature interaction inside the network; (3) instead of transmitting messages arbitrarily, a scene knowledge graph is devised to differentiate prior knowledge from various types of the road genome. We evaluate TopoNet on the challenging scene understanding benchmark, OpenLane-V2, where our approach outperforms all previous works by a great margin on all perceptual and topological metrics. The code is released at https://github.com/OpenDriveLab/TopoNet

[1]  Chang Huang,et al.  MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction , 2023, ArXiv.

[2]  Shaoshuai Shi,et al.  Sparse Dense Fusion for 3D Object Detection , 2023, 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Abhinav Valada,et al.  Learning and Aggregating Lane Graphs for Urban Automated Driving , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  François Rameau,et al.  InstaGraM: Instance-level Graph Modeling for Vectorized HD Map Learning , 2023, ArXiv.

[5]  James Hays,et al.  Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting , 2023, NeurIPS Datasets and Benchmarks.

[6]  Jifeng Dai,et al.  Planning-oriented Autonomous Driving , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jianwu Fang,et al.  Heterogeneous Trajectory Forecasting via Risk and Scene Graph Learning , 2022, IEEE Transactions on Intelligent Transportation Systems.

[8]  Chang Huang,et al.  MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction , 2022, ICLR.

[9]  Junchi Yan,et al.  ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning , 2022, ECCV.

[10]  Jin Gao,et al.  PolarFormer: Multi-camera 3D Object Detection with Polar Transformers , 2022, AAAI.

[11]  Yilun Wang,et al.  VectorMapNet: End-to-end Vectorized HD Map Learning , 2022, ICML.

[12]  Huizi Mao,et al.  BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Philipp Krahenbuhl,et al.  Cross-view Transformers for real-time Map-view Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Junchi Yan,et al.  HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  S. Fidler,et al.  M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation , 2022, ArXiv.

[16]  Jifeng Dai,et al.  BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers , 2022, ECCV.

[17]  Junchi Yan,et al.  PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark , 2022, ECCV.

[18]  Harinarayanan Balakrishnan,et al.  Lane-Level Street Map Extraction from Aerial Imagery , 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[19]  L. Gool,et al.  Topology Preserving Local Road Network Estimation from Single Onboard Camera Image , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  S. Bai,et al.  SeqFormer: Sequential Transformer for Video Instance Segmentation , 2021, ECCV.

[21]  J. M. Zöllner,et al.  Towards Traffic Scene Description: The Semantic Scene Graph , 2021, 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC).

[22]  Deepan Muthirayan,et al.  Spatiotemporal Scene-Graph Embedding for Autonomous Vehicle Collision Prediction , 2021, IEEE Internet of Things Journal.

[23]  Luc Van Gool,et al.  Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Vishal M. Patel,et al.  SPIN Road Mapper: Extracting Roads from Aerial Images via Spatial and Interaction Space Graph Reasoning for Autonomous Driving , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[25]  Mohammad Abdullah Al Faruque,et al.  roadscene2vec: A Tool for Extracting and Embedding Road Scene-Graphs , 2021, Knowl. Based Syst..

[26]  Mohana,et al.  Graph Neural Network (GNN) in Image and Video Understanding Using Deep Learning for Computer Vision Applications , 2021, 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC).

[27]  Yilun Wang,et al.  HDMapNet: An Online HD Map Construction and Evaluation Framework , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[28]  Wolfram Burgard,et al.  Lane Graph Estimation for Scene Understanding in Urban Driving , 2021, IEEE Robotics and Automation Letters.

[29]  Zhiwei Guo,et al.  A Deep Graph Neural Network-Based Mechanism for Social Recommendations , 2021, IEEE Transactions on Industrial Informatics.

[30]  Kris Kitani,et al.  PTP: Parallelized Tracking and Prediction With Graph Neural Networks and Diversity Sampling , 2021, IEEE Robotics and Automation Letters.

[31]  Xiaojiang Chen,et al.  A Comprehensive Survey of Scene Graphs: Generation and Application , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  L. Gool,et al.  Understanding Bird’s-Eye View of Road Semantics Using an Onboard Camera , 2020, IEEE Robotics and Automation Letters.

[33]  Ruifeng Li,et al.  Road Scene Graph: A Semantic Graph-Based Scene Representation Dataset for Intelligent Vehicles , 2020, ArXiv.

[34]  Thiago Oliveira-Santos,et al.  Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[36]  Deepan Muthirayan,et al.  Scene-Graph Augmented Data-Driven Risk Assessment of Autonomous Vehicle Decisions , 2020, IEEE Transactions on Intelligent Transportation Systems.

[37]  Sanja Fidler,et al.  Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D , 2020, ECCV.

[38]  R. Urtasun,et al.  Learning Lane Graph Representations for Motion Forecasting , 2020, ECCV.

[39]  Hari Balakrishnan,et al.  Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding , 2020, ECCV.

[40]  Kris Kitani,et al.  GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[42]  K. M. Krishna,et al.  Understanding Dynamic Scenes using Graph Convolution Networks , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Roberto Cipolla,et al.  Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  T. Choe,et al.  Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection , 2020, ECCV.

[45]  Abduallah A. Mohamed,et al.  Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  James F. Allen,et al.  Tesla , 2020, Definitions.

[47]  K. M. Krishna,et al.  Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[48]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[49]  Raquel Urtasun,et al.  DAGMapper: Learning to Map by Discovering Lane Topology , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Stanley H. Chan,et al.  Learning 3D-aware Egocentric Spatial-Temporal Interaction via Graph Convolutional Networks , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[51]  Bolei Zhou,et al.  Cross-View Semantic Segmentation for Sensing Surroundings , 2019, IEEE Robotics and Automation Letters.

[52]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[53]  Florentin Wörgötter,et al.  Deep Metadata Fusion for Traffic Light to Lane Assignment , 2019, IEEE Robotics and Automation Letters.

[54]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Mayank Bansal,et al.  ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[56]  Dan Levi,et al.  3D-LaneNet: End-to-End 3D Multiple Lane Detection , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Xiaogang Wang,et al.  Spatial As Deep: Spatial CNN for Traffic Scene Understanding , 2017, AAAI.

[58]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[59]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[60]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[61]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[62]  Serge J. Belongie,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[64]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Junchi Yan,et al.  Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach , 2022, CoRL.

[67]  Yuxuan Liu,et al.  CenterLineDet: Road Lane CenterLine Graph Detection With Vehicle-Mounted Sensors by Transformer for High-definition Map Creation , 2022, ArXiv.

[68]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[69]  H. Mannila,et al.  Computing Discrete Fréchet Distance ∗ , 1994 .