VLASE: Vehicle Localization by Aggregating Semantic Edges

We propose VLASE, a framework to use semantic edge features from images to achieve on-road localization. Semantic edge features denote edge contours that separate pairs of distinct objects such as building-sky, road-sidewalk, and building-ground. While prior work has shown promising results by utilizing the boundary between prominent classes such as sky and building using skylines, we generalize this to consider 19 semantic classes. We extract semantic edge features using CASENet architecture and utilize VLAD framework to perform image retrieval. We achieve improvement over state-of-the-art localization algorithms such as SIFT-VLAD and its deep variant NetVLAD. Ablation study shows the importance of different semantic classes, and our unified approach achieves better performance compared to individual prominent features such as skylines. We also introduce SLC Marathon dataset, a challenging dataset covering most of Salt Lake City with sufficient lighting variations.

[1]  Carl Olsson,et al.  Long-Term 3D Localization and Pose from Semantic Labellings , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[2]  Robert Pless,et al.  Geolocating Static Cameras , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[4]  Torsten Sattler,et al.  Benchmarking 6DOF Urban Visual Localization in Changing Conditions , 2017, ArXiv.

[5]  Roberto Cipolla,et al.  An Image-Based System for Urban Navigation , 2004, BMVC.

[6]  Alan L. Yuille,et al.  Towards unified depth and semantic prediction from a single image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Daniel Cremers,et al.  Image-based Localization with Spatial LSTMs , 2016, ArXiv.

[8]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Valérie Gouet-Brunet,et al.  A survey on Visual-Based Localization: On the benefit of heterogeneous data , 2018, Pattern Recognit..

[12]  Liang Wang,et al.  A Dataset for Benchmarking Image-Based Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Kostas Daniilidis,et al.  Monocular visual odometry in urban environments using an omnidirectional camera , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Il Hong Suh,et al.  Outdoor place recognition in urban environments using straight lines , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Haroon Idrees,et al.  Large-Scale Image Geo-Localization Using Dominant Sets , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Torsten Sattler,et al.  Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[17]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Torsten Sattler,et al.  Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Mubarak Shah,et al.  Cross-View Image Matching for Geo-Localization in Urban Environments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22]  J. Meguro,et al.  Development of positioning technique using omni-directional IR camera and aerial survey data , 2007, 2007 IEEE/ASME international conference on advanced intelligent mechatronics.

[23]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Matthew Brand,et al.  SKYLINE2GPS: Localization in urban canyons using omni-skylines , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Torsten Sattler,et al.  Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Serge J. Belongie,et al.  Learning deep representations for ground-to-aerial geolocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Horst Wildenauer,et al.  Descriptor free visual indoor localization with line segments , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  John Leonard,et al.  Self-Supervised Visual Place Recognition Learning in Mobile Robots , 2019, ArXiv.

[30]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[31]  Seth J. Teller,et al.  Wide-Area Egomotion Estimation from Known 3D Structure , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  James Hays,et al.  Localizing and Orienting Street Views Using Overhead Imagery , 2016, ECCV.

[33]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Marc Pollefeys,et al.  Image Based Geo-localization in the Alps , 2016, International Journal of Computer Vision.

[35]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[36]  Tomás Pajdla,et al.  Learning and Calibrating Per-Location Classifiers for Visual Place Recognition , 2013, International Journal of Computer Vision.

[37]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[38]  Davide Scaramuzza,et al.  Air‐ground Matching: Appearance‐based GPS‐denied Urban Localization of Micro Aerial Vehicles , 2015, J. Field Robotics.

[39]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Vincent Lepetit,et al.  Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ming-Yu Liu,et al.  CASENet: Deep Category-Aware Semantic Edge Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[44]  Masatoshi Okutomi,et al.  Visual Place Recognition with Repetitive Structures , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Gordon Wyeth,et al.  RatSLAM: a hippocampal model for simultaneous localization and mapping , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[46]  Jae Shin Yoon,et al.  All-Day Visual Place Recognition : Benchmark Dataset and Baseline , 2015 .

[47]  Peter F. Sturm,et al.  Pose estimation using both points and lines for geo-localization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[48]  Bedrich Benes,et al.  Barcode: Global Binary Patterns for Fast Visual Inference , 2017, 2017 International Conference on 3D Vision (3DV).

[49]  Andreas Geiger,et al.  Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Mayank Bansal,et al.  Geometric Urban Geo-localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Gérard G. Medioni,et al.  Map-based localization using the panoramic horizon , 1995, IEEE Trans. Robotics Autom..

[52]  Torsten Sattler,et al.  Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[54]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[55]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[57]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[58]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[59]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Takeo Kanade,et al.  Real-time topometric localization , 2012, 2012 IEEE International Conference on Robotics and Automation.