Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving

Understanding the complex traffic environment is crucial for self-driving vehicles. Existing benchmarks in autonomous driving mainly cast scene understanding as perception problems, e.g., perceiving lanelines with vanilla detection or segmentation methods. As such, we argue that the perception pipeline provides limited information for autonomous vehicles to drive in the right way, especially without the aid of high-definition (HD) map. For instance, following the wrong traffic signal at a complicated crossroad would lead to a catastrophic incident. By introducing Road Genome (OpenLane-V2), we intend to shift the community's attention and take a step further beyond perception - to the task of topology reasoning for scene structure. The goal of Road Genome is to understand the scene structure by investigating the relationship of perceived entities among traffic elements and lanes. Built on top of prevailing datasets, the newly minted benchmark comprises 2,000 sequences of multi-view images captured from diverse real-world scenarios. We annotate data with high-quality manual checks in the loop. Three subtasks compromise the gist of Road Genome, including the 3D lane detection inherited from OpenLane. We have/will host Challenges in the upcoming future at top-tiered venues.

[1]  Jifeng Dai,et al.  Planning-oriented Autonomous Driving , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Junchi Yan,et al.  Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe , 2022, ArXiv.

[3]  Junchi Yan,et al.  Opendenselane: A Dense Lidar-Based Dataset for HD Map Construction , 2022, 2022 IEEE International Conference on Multimedia and Expo (ICME).

[4]  Yanwei Fu,et al.  ONCE-3DLanes: Building Monocular 3D Lane Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jifeng Dai,et al.  BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers , 2022, ECCV.

[6]  Junchi Yan,et al.  PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark , 2022, ECCV.

[7]  James Hays,et al.  Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting , 2023, NeurIPS Datasets and Benchmarks.

[8]  Ruifeng Li,et al.  Road Scene Graph: A Semantic Graph-Based Scene Representation Dataset for Intelligent Vehicles , 2020, ArXiv.

[9]  Wei Zhang,et al.  CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending , 2020, ECCV.

[10]  L. Porzi,et al.  The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale , 2019, ECCV.

[11]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ruigang Yang,et al.  The ApolloScape Open Dataset for Autonomous Driving and Its Application , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Karsten Behrendt,et al.  Unsupervised Labeled Lane Markers Using Maps , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[15]  Christopher D. Manning,et al.  GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Dariu Gavrila,et al.  Context-Based Path Prediction for Targets with Switching Dynamics , 2018, International Journal of Computer Vision.

[17]  Kate Saenko,et al.  Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Andreas Fregin,et al.  The DriveU Traffic Light Dataset: Introduction and Comparison with Existing Datasets , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Xiaogang Wang,et al.  Spatial As Deep: Spatial CNN for Traffic Scene Understanding , 2017, AAAI.

[20]  Jia Deng,et al.  Learning to Detect Human-Object Interactions , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  In So Kweon,et al.  VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  John K. Tsotsos,et al.  Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[23]  Karsten Behrendt,et al.  A deep learning approach to traffic lights: Detection, tracking, and classification , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Michael S. Bernstein,et al.  Visual Relationship Detection with Language Priors , 2016, ECCV.

[25]  Baoli Li,et al.  Traffic-Sign Detection and Classification in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[27]  Jitendra Malik,et al.  Visual Semantic Role Labeling , 2015, ArXiv.

[28]  Dariu Gavrila,et al.  Context-Based Pedestrian Path Prediction , 2014, ECCV.

[29]  Johannes Stallkamp,et al.  Detection of traffic signs in real-world images: The German traffic sign detection benchmark , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[30]  Thomas B. Moeslund,et al.  Vision-Based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey , 2012, IEEE Transactions on Intelligent Transportation Systems.

[31]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[32]  Tao Wu,et al.  A practical system for road marking detection and recognition , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[33]  Pierre Charbonnier,et al.  Road Sign Detection in Images: A Case Study , 2010, 2010 20th International Conference on Pattern Recognition.

[34]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[35]  Luc Van Gool,et al.  Multi-view traffic sign detection, recognition, and 3D localisation , 2014, 2009 Workshop on Applications of Computer Vision (WACV).

[36]  Fawzi Nashashibi,et al.  Real time visual traffic lights recognition based on Spot Light Detection and adaptive traffic lights templates , 2009, 2009 IEEE Intelligent Vehicles Symposium.

[37]  Mohamed Aly,et al.  Real time detection of lane markers in urban streets , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[38]  H. Mannila,et al.  Computing Discrete Fréchet Distance ∗ , 1994 .