Pix2Map: Cross-Modal Retrieval for Inferring Street Maps from Images

Self-driving vehicles rely on urban street maps for autonomous navigation. In this paper, we introduce Pix2Map, a method for inferring urban street map topology directly from ego-view images, as needed to continually update and expand existing maps. This is a challenging task, as we need to infer a complex urban road topology directly from raw image data. The main insight of this paper is that this problem can be posed as cross-modal retrieval by learning a joint, cross-modal embedding space for images and existing maps, represented as discrete graphs that encode the topological layout of the visual surroundings. We conduct our experimental evaluation using the Argoverse dataset and show that it is indeed possible to accurately retrieve street maps corresponding to both seen and unseen roads solely from image data. Moreover, we show that our retrieved maps can be used to update or expand existing maps and even show proof-of-concept results for visual localization and image retrieval from spatial graphs.

[1]  L. Leal-Taixé,et al.  Is Geometry Enough for Matching in Visual Localization? , 2022, ECCV.

[2]  L. Gool,et al.  Topology Preserving Local Road Network Estimation from Single Onboard Camera Image , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  R. Bowden,et al.  Translating Images into Maps , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[4]  Fabien Moutarde,et al.  GOHOME: Graph-Oriented Heatmap Output for future Motion Estimation , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[5]  Xinggang Wang,et al.  YOLOP: You Only Look Once for Panoptic Driving Perception , 2021, Machine Intelligence Research.

[6]  Yilun Wang,et al.  HDMapNet: An Online HD Map Construction and Evaluation Framework , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[7]  Stewart Worrall,et al.  Long-Term Map Maintenance Pipeline for Autonomous Vehicles , 2020, IEEE Transactions on Intelligent Transportation Systems.

[8]  Moongu Jeon,et al.  Key Points Estimation and Point Instance Segmentation Approach for Lane Detection , 2020, ArXiv.

[9]  James Hays,et al.  Trust, but Verify: Cross-Modality Fusion for HD Map Change Detection , 2022, NeurIPS Datasets and Benchmarks.

[10]  Luc Van Gool,et al.  Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Eric M. Wolff,et al.  Multimodal Trajectory Prediction Conditioned on Lane-Graph Traversals , 2021, CoRL.

[12]  Cordelia Schmid,et al.  HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Shengfeng He,et al.  Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Siyu Zhu,et al.  CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  R. Cipolla,et al.  FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[17]  Dhruv Batra,et al.  Semantic MapNet: Building Allocentric SemanticMaps and Representations from Egocentric Views , 2020, AAAI.

[18]  Gunther Krehl,et al.  Sensor Fusion-based Online Map Validation for Autonomous Driving , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[19]  Kichun Jo,et al.  Semantic Point Cloud Mapping of LiDAR Based on Probabilistic Uncertainty Modeling for Autonomous Driving , 2020, Sensors.

[20]  Roberto Cipolla,et al.  Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bingqi Zhang,et al.  High Definition Map for Automated Driving: Overview and Analysis , 2020, Journal of Navigation.

[24]  Raquel Urtasun,et al.  DAGMapper: Learning to Map by Discovering Lane Topology , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Sanja Fidler,et al.  Neural Turtle Graphics for Modeling City Road Layouts , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Karsten Behrendt,et al.  Unsupervised Labeled Lane Markers Using Maps , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[27]  Mingyuan Zhou,et al.  Variational Graph Recurrent Neural Networks , 2019, NeurIPS.

[28]  Raquel Urtasun,et al.  Exploiting Sparse Semantic HD Maps for Self-Driving Vehicle Localization , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Sven Behnke,et al.  Semi-supervised Semantic Mapping Through Label Propagation with Semantic Texture Meshes , 2019, International Journal of Computer Vision.

[30]  Dominik Jain,et al.  Crowdsourced HD Map Patches Based on Road Model Inference and Graph-Based SLAM , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[31]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Wei Li,et al.  DET: A High-Resolution DVS Dataset for Lane Extraction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Raquel Urtasun,et al.  Convolutional Recurrent Network for Road Boundary Extraction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Wolfram Burgard,et al.  HD Map Change Detection with a Boosted Particle Filter , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[35]  Roland Siegwart,et al.  Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery , 2019, IEEE Robotics and Automation Letters.

[36]  Jan Dirk Wegner,et al.  Topological Map Extraction From Overhead Images , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[38]  Raquel Urtasun,et al.  Learning to Localize Using a LiDAR Intensity Map , 2018, CoRL.

[39]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[40]  Cyrill Stachniss,et al.  Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments , 2018, Robotics: Science and Systems.

[41]  Raquel Urtasun,et al.  Hierarchical Recurrent Attention Networks for Structured Online Maps , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Jure Leskovec,et al.  GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models , 2018, ICML.

[43]  Xiaogang Wang,et al.  Spatial As Deep: Spatial CNN for Traffic Scene Understanding , 2017, AAAI.

[44]  Sanja Fidler,et al.  Annotating Object Instances with a Polygon-RNN , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Min Bai,et al.  TorontoCity: Seeing the World with a Million Eyes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Jörg Stückler,et al.  Scene flow propagation for semantic mapping and object discovery in dynamic street scenes , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Xiaolong Hu,et al.  Autonomous Driving in the iCity—HD Maps as a Key Challenge of the Automotive Industry , 2016 .

[48]  Andreas Geiger,et al.  Map-Based Probabilistic Visual Self-Localization , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Jörg Stückler,et al.  Large-scale direct SLAM with stereo cameras , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[51]  Patrick Pérez,et al.  Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[52]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[53]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[54]  Peter Wonka,et al.  What Makes London Work Like London? , 2014, Comput. Graph. Forum.

[55]  Ali Shahrokni,et al.  Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[58]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[59]  Gerhard Lakemeyer,et al.  Exploring artificial intelligence in the new millennium , 2003 .

[60]  Sebastian Thrun,et al.  Robotic mapping: a survey , 2003 .

[61]  Richard M. Karp,et al.  An optimal algorithm for on-line bipartite matching , 1990, STOC '90.

[62]  Robert C. Bolles,et al.  Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching , 1977, IJCAI.

[63]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .