24/7 Place Recognition by View Synthesis

We address the problem of large-scale visual place recognition for situations where the scene undergoes a major change in appearance, for example, due to illumination (day/night), change of seasons, aging, or structural modifications over time such as buildings built or destroyed. Such situations represent a major challenge for current large-scale place recognition methods. This work has the following three principal contributions. First, we demonstrate that matching across large changes in the scene appearance becomes much easier when both the query image and the database image depict the scene from approximately the same viewpoint. Second, based on this observation, we develop a new place recognition approach that combines (i) an efficient synthesis of novel views with (ii) a compact indexable image representation. Third, we introduce a new challenging dataset of 1,125 camera-phone query images of Tokyo that contain major changes in illumination (day, sunset, night) as well as structural changes in the scene. We demonstrate that the proposed approach significantly outperforms other large-scale place recognition techniques on this challenging data.

[1]  Horst Bischof,et al.  From structure-from-motion point clouds to fast location recognition , 2009, CVPR.

[2]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[3]  Jana Kosecka,et al.  Probabilistic location recognition using reduced feature set , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[4]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  David Martin,et al.  Street View Motion-from-Structure-from-Motion , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[8]  Torsten Sattler,et al.  SIFT-Realistic Rendering , 2013, 2013 International Conference on 3D Vision.

[9]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[10]  Mathieu Aubry,et al.  Painting-to-3D model alignment via discriminative visual elements , 2014, TOGS.

[11]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[12]  Xiaofeng Tao,et al.  Transient attributes for high-level understanding and editing of outdoor scenes , 2014, ACM Trans. Graph..

[13]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Guillaume Gravier,et al.  Oriented pooling for dense and non-dense rotation-invariant features , 2013, BMVC.

[15]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[17]  Paul Newman,et al.  Highly scalable appearance-only SLAM - FAB-MAP 2.0 , 2009, Robotics: Science and Systems.

[18]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Noah Snavely,et al.  Minimal Scene Descriptions from Structure from Motion Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Torsten Sattler,et al.  Image Retrieval for Image-Based Localization Revisited , 2012, BMVC.

[24]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[25]  Masatoshi Okutomi,et al.  Visual Place Recognition with Repetitive Structures , 2013, CVPR.

[26]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Torsten Sattler,et al.  Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[28]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[29]  Jan-Michael Frahm,et al.  3D model matching with Viewpoint-Invariant Patches (VIP) , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[32]  Andrew Zisserman,et al.  DisLocation: Scalable Descriptor Distinctiveness for Location Recognition , 2014, ACCV.

[33]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Robert Pless,et al.  Geolocating Static Cameras , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[35]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Steven M. Seitz,et al.  Accurate Geo-Registration by Ground-to-Aerial Image Matching , 2014, 2014 2nd International Conference on 3D Vision.

[37]  Noah Snavely,et al.  Reasoning about Photo Collections using Models of Outdoor Illumination , 2014, BMVC.

[38]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[39]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[40]  Noah Snavely,et al.  Graph-Based Discriminative Learning for Location Recognition , 2013, International Journal of Computer Vision.

[41]  Noah Snavely,et al.  Image matching using local symmetry features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.