S-BEV: Semantic Birds-Eye View Representation for Weather and Lighting Invariant 3-DoF Localization

We describe a light-weight, weather and lighting invariant, Semantic Bird’s Eye View (S-BEV) signature for vision-based vehicle re-localization. A topological map of SBEV signatures is created during the first traversal of the route, which are used for coarse localization in subsequent route traversal. A fine-grained localizer is then trained to output the global 3-DoF pose of the vehicle using its S-BEV and its coarse localization. We conduct experiments on vKITTI2 virtual dataset and show the potential of the S-BEV to be robust to weather and lighting. We also demonstrate results with 2 vehicles on a 22 km long highway route in the Ford AV dataset.

[1]  Rares Ambrus,et al.  3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Joseph E. Gonzalez,et al.  BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud , 2020, ArXiv.

[3]  Roland Siegwart,et al.  From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Torsten Sattler,et al.  Semantic Match Consistency for Long-Term Visual Localization , 2018, ECCV.

[5]  Michael Milford,et al.  Delta Descriptors: Change-Based Place Representation for Robust Visual Localization , 2020, IEEE Robotics and Automation Letters.

[6]  Siddharth Agarwal,et al.  Ford Multi-AV Seasonal Dataset , 2020, ArXiv.

[7]  Nghia Ho,et al.  Localization on freeways using the horizon line signature , 2014 .

[8]  Torsten Sattler,et al.  Toroidal Constraints for Two-Point Localization Under High Outlier Ratios , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Torsten Sattler,et al.  Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[13]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Torsten Sattler,et al.  Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Tinne Tuytelaars,et al.  Deep-Geometric 6 DoF Localization from a Single Image in Topo-metric Maps , 2020, ArXiv.

[18]  Lars Hammarstrand,et al.  Long-Term Visual Localization Using Semantically Segmented Images , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[20]  Fredrik Kahl,et al.  City-Scale Localization for Cameras with Known Vertical Direction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Naila Murray,et al.  Virtual KITTI 2 , 2020, ArXiv.

[22]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[23]  H. Bischof,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.