论文信息 - On Localizing a Camera from a Single Image

On Localizing a Camera from a Single Image

Public cameras often have limited metadata describing their attributes. A key missing attribute is the precise location of the camera, using which it is possible to precisely pinpoint the location of events seen in the camera. In this paper, we explore the following question: under what conditions is it possible to estimate the location of a camera from a single image taken by the camera? We show that, using a judicious combination of projective geometry, neural networks, and crowd-sourced annotations from human workers, it is possible to position 95% of the images in our test data set to within 12 m. This performance is two orders of magnitude better than PoseNet, a state-of-the-art neural network that, when trained on a large corpus of images in an area, can estimate the pose of a single image. Finally, we show that the camera's inferred position and intrinsic parameters can help design a number of virtual sensors, all of which are reasonably accurate.

[1] Roberto Cipolla,et al. Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[2] Xiaochen Liu,et al. TAR: Enabling Fine-Grained Targeted Advertising in Retail Stores , 2018, MobiSys.

[3] Roberto Cipolla,et al. Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Gary R. Bradski,et al. ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[5] Simon J. D. Prince,et al. Computer Vision: Models, Learning, and Inference , 2012 .

[6] Danfei Xu,et al. Topometric localization on a road network , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7] Vincent Lepetit,et al. Accurate Camera Registration in Urban Environments Using High-Level Feature Matching , 2017, BMVC.

[8] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[9] Andrew W. Fitzgibbon,et al. Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Mayank Bansal,et al. Geometric Urban Geo-localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Rafael Grompone von Gioi,et al. Finding Vanishing Points via Point Alignments in Image Primal and Dual Domains , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12] David A. Forsyth,et al. Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13] Wolfram Burgard,et al. Deep Auxiliary Learning for Visual Localization and Odometry , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14] Jana Kosecka,et al. 3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Xiaogang Wang,et al. Intelligent multi-camera video surveillance: A review , 2013, Pattern Recognit. Lett..

[16] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[18] Torsten Sattler,et al. Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Wei Zhang,et al. Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[20] Roberto Cipolla,et al. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21] Wei Zhang,et al. Video Compass , 2002, ECCV.

[22] Marie-Odile Berger,et al. A Simple and Effective Method to Detect Orthogonal Vanishing Points in Uncalibrated Images of Man-Made Environments , 2016, Eurographics.

[23] Juho Kannala,et al. Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization , 2018, ECCV Workshops.

[24] Vikas Kumar,et al. CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones , 2010, MobiSys '10.

[25] Marc Pollefeys,et al. 3-line RANSAC for orthogonal vanishing point detection , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26] Mubarak Shah,et al. Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[27] Johannes Schöning,et al. PhotoMap: using spontaneously taken images of public maps for pedestrian navigation tasks on mobile devices , 2009, Mobile HCI.

[28] M. Gordan,et al. Camera calibration using two or three vanishing points , 2012, 2012 Federated Conference on Computer Science and Information Systems (FedCSIS).

[29] Ramesh Govindan,et al. Satyam: Democratizing Groundtruth for Machine Vision , 2018, ArXiv.

[30] Li Weng,et al. Cross-domain image localization by adaptive feature fusion , 2017, 2017 Joint Urban Remote Sensing Event (JURSE).

[31] Torsten Sattler,et al. Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32] Yair Movshovitz-Attias,et al. Ontological supervision for fine grained classification of Street View storefronts , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Andrew W. Fitzgibbon,et al. Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[35] Wolfram Burgard,et al. VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[36] David Nistér,et al. An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Torsten Sattler,et al. Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Luigi di Stefano,et al. On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] J J Koenderink,et al. Affine structure from motion. , 1991, Journal of the Optical Society of America. A, Optics and image science.

[41] Xin Chen,et al. City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[42] Kristin J. Dana,et al. Compact representation of bidirectional texture functions , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[43] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44] Agnès Desolneux,et al. Vanishing Point Detection without Any A Priori Information , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[45] Reinhard Koch,et al. Pose estimation and map building with a Time-Of-Flight-camera for robot navigation , 2008, Int. J. Intell. Syst. Technol. Appl..

[46] Carsten Rother,et al. A New Approach for Vanishing Point Detection in Architectural Environments , 2000, BMVC.

[47] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[48] Steven A. Shafer,et al. What is the center of the image? , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[49] Allan Hanbury,et al. Robust camera self-calibration from monocular images of Manhattan worlds , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.