24/7 Place Recognition by View Synthesis

We address the problem of large-scale visual place recognition for situations where the scene undergoes a major change in appearance, for example, due to illumination (day/night), change of seasons, aging, or structural modifications over time such as buildings being built or destroyed. Such situations represent a major challenge for current large-scale place recognition methods. This work has the following three principal contributions. First, we demonstrate that matching across large changes in the scene appearance becomes much easier when both the query image and the database image depict the scene from approximately the same viewpoint. Second, based on this observation, we develop a new place recognition approach that combines (i) an efficient synthesis of novel views with (ii) a compact indexable image representation. Third, we introduce a new challenging dataset of 1,125 camera-phone query images of Tokyo that contain major changes in illumination (day, sunset, night) as well as structural changes in the scene. We demonstrate that the proposed approach significantly outperforms other large-scale place recognition techniques on this challenging data.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Noah Snavely,et al.  Minimal Scene Descriptions from Structure from Motion Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Torsten Sattler,et al.  Large-Scale Location Recognition and the Geometric Burstiness Problem , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xiaofeng Tao,et al.  Transient attributes for high-level understanding and editing of outdoor scenes , 2014, ACM Trans. Graph..

[5]  Noah Snavely,et al.  Reasoning about Photo Collections using Models of Outdoor Illumination , 2014, BMVC.

[6]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Andrew Zisserman,et al.  DisLocation: Scalable Descriptor Distinctiveness for Location Recognition , 2014, ACCV.

[9]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[10]  Noah Snavely,et al.  Graph-Based Discriminative Learning for Location Recognition , 2013, International Journal of Computer Vision.

[11]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[13]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[15]  Masatoshi Okutomi,et al.  Multi-view Inverse Rendering Under Arbitrary Illumination and Albedo , 2016, ECCV.

[16]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[18]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[20]  Noah Snavely,et al.  Scene Chronology , 2014, ECCV.

[21]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Torsten Sattler,et al.  Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[23]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Jian Sun,et al.  Optimized Product Quantization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Noah Snavely,et al.  Image matching using local symmetry features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[28]  Torsten Sattler,et al.  SIFT-Realistic Rendering , 2013, 2013 International Conference on 3D Vision.

[29]  Masatoshi Okutomi,et al.  24/7 Place Recognition by View Synthesis , 2015, CVPR.

[30]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[31]  Masatoshi Okutomi,et al.  Visual Place Recognition with Repetitive Structures , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Atsuto Maki,et al.  Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Robert Pless,et al.  Geolocating Static Cameras , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Paul Newman,et al.  Illumination Invariant Imaging : Applications in Robust Vision-based Localisation , Mapping and Classification for Autonomous Vehicles , 2014 .

[36]  Torsten Sattler,et al.  Image Retrieval for Image-Based Localization Revisited , 2012, BMVC.

[37]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[38]  Jan-Michael Frahm,et al.  3D model matching with Viewpoint-Invariant Patches (VIP) , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Steven M. Seitz,et al.  Accurate Geo-Registration by Ground-to-Aerial Image Matching , 2014, 2014 2nd International Conference on 3D Vision.

[40]  Atsuto Maki,et al.  A Baseline for Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR 2015.

[41]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[42]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Torsten Sattler,et al.  Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[46]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Guillaume Gravier,et al.  Oriented pooling for dense and non-dense rotation-invariant features , 2013, BMVC.

[48]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[49]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[50]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[51]  David J. Fleet,et al.  Cartesian K-Means , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Mathieu Aubry,et al.  Painting-to-3D model alignment via discriminative visual elements , 2014, TOGS.

[53]  David Martin,et al.  Street View Motion-from-Structure-from-Motion , 2013, 2013 IEEE International Conference on Computer Vision.

[54]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[55]  Jana Kosecka,et al.  Probabilistic location recognition using reduced feature set , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[56]  Niko Sünderhauf,et al.  Superpixel-based appearance change prediction for long-term navigation across seasons , 2014, Robotics Auton. Syst..

[57]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[59]  Paul Newman,et al.  Highly scalable appearance-only SLAM - FAB-MAP 2.0 , 2009, Robotics: Science and Systems.