LandscapeAR: Large Scale Outdoor Augmented Reality by Matching Photographs with Terrain Models Using Learned Descriptors

We introduce a solution to large scale Augmented Reality for outdoor scenes by registering camera images to textured Digital Elevation Models (DEMs). To accommodate the inherent differences in appearance between real images and DEMs, we train a cross-domain feature descriptor using Structure From Motion (SFM) guided reconstructions to acquire training data. Our method runs efficiently on a mobile device and outperforms existing learned and hand-designed feature descriptors for this task.

[1]  Ziyan Wu,et al.  End-to-End Learning of Keypoint Detector and Descriptor for Pose Invariant 3D Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Cristhian Aguilera,et al.  Cross-Spectral Local Descriptors via Quadruplet Network , 2017, Sensors.

[3]  Hyojin Kim,et al.  Dude (Duality descriptor): A robust descriptor for disparate images using line segment duality , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[4]  Frédéric Jurie,et al.  TS-NET: Combining Modality Specific and Common Features for Multimodal Patch Matching , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[5]  Paul A. Viola,et al.  Alignment by Maximization of Mutual Information , 1997, International Journal of Computer Vision.

[6]  Dani Lischinski,et al.  Deep photo: model-based photograph enhancement and viewing , 2008, SIGGRAPH Asia '08.

[7]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Jie Tian,et al.  Real-time multi-modal rigid registration based on a novel symmetric-SIFT descriptor , 2009 .

[9]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[10]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[11]  Yosi Keller,et al.  Multimodal matching using a Hybrid Convolutional Neural Network , 2018, ArXiv.

[12]  Stephen DiVerdi,et al.  Immersive Trip Reports , 2018, UIST.

[13]  Balázs Nagy A New Method of Improving the Azimuth in Mountainous Terrain by Skyline Matching , 2020, PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science.

[14]  Hans-Peter Seidel,et al.  Automatic photo-to-terrain alignment for the annotation of mountain pictures , 2011, CVPR 2011.

[15]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[16]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Cristhian Aguilera,et al.  Learning Cross-Spectral Similarity Measures with Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Noah Snavely,et al.  Accurate Georegistration of Point Clouds Using Geographic Data , 2013, 2013 International Conference on 3D Vision.

[20]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[21]  Gustavo Carneiro,et al.  Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Detection and Description of Local Features , 2019, CVPR 2019.

[25]  Sungkil Lee,et al.  Automated outdoor depth-map generation and alignment , 2018, Comput. Graph..

[26]  Amir Averbuch,et al.  Multisensor image registration via implicit similarity , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ziyan Wu,et al.  Learning Local RGB-to-CAD Correspondences for Object Pose Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  P. Anandan,et al.  Robust multi-sensor image alignment , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[29]  Tomás Pajdla,et al.  Neighbourhood Consensus Networks , 2018, NeurIPS.

[30]  Mark R. Pickering,et al.  Modified SIFT for multi-modal remote sensing image registration , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[31]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[32]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Description and Detection of Local Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[35]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[36]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Minh N. Do,et al.  DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Martin Cadík,et al.  GeoPose3K: Mountain landscape dataset for camera pose estimation in outdoor environments , 2017, Image Vis. Comput..

[40]  Martin Cadík,et al.  Camera Orientation Estimation in Natural Scenes Using Semantic Cues , 2018, 2018 International Conference on 3D Vision (3DV).