论文信息 - What makes Paris look like Paris?

What makes Paris look like Paris?

Given a large repository of geotagged imagery, we seek to automatically find visual elements, e. g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervision. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner. We demonstrate that these elements are visually interpretable and perceptually geo-informative. The discovered visual elements can also support a variety of computational geography tasks, such as mapping architectural correspondences and influences within and across cities, finding representative elements at different geo-spatial scales, and geographically-informed image retrieval.

[1] François Loyer. Paris Nineteenth Century: Architecture and Urbanism , 1988 .

[2] A. Sutcliffe. Paris: An Architectural History , 1993 .

[3] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4] Antonio Torralba,et al. Statistics of natural image categories , 2003, Network.

[5] John Hart,et al. ACM Transactions on Graphics , 2004, SIGGRAPH 2004.

[6] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7] Alexei A. Efros,et al. Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8] Luc Van Gool,et al. Procedural modeling of buildings , 2006, SIGGRAPH 2006.

[9] Antonio Torralba,et al. Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[10] Frédéric Jurie,et al. Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[11] Steven M. Seitz,et al. Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12] Richard Szeliski,et al. City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Stefano Soatto,et al. Localizing Objects with Smart Dictionaries , 2008, ECCV.

[14] Luc Van Gool,et al. World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[15] Alexei A. Efros,et al. IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Roberto Cipolla,et al. Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Tat-Seng Chua,et al. Tour the world: Building a web-scale landmark recognition engine , 2009, CVPR.

[18] Shimon Ullman,et al. Unsupervised feature optimization (UFO): Simultaneous selection of multiple features with their detection parameters , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Jon M. Kleinberg,et al. Mapping the world's photos , 2009, WWW '09.

[20] Yong Jae Lee,et al. Foreground Focus: Unsupervised Learning from Partially Matching Images , 2009, International Journal of Computer Vision.

[21] Alexander C. Berg,et al. Finding iconic images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[22] Yang Song,et al. Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24] O. Chum,et al. Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Jiri Matas,et al. Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, CVPR.

[26] Daniel P. Huttenlocher,et al. Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27] Alexei A. Efros,et al. Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28] Shimon Ullman,et al. Unsupervised feature optimization (UFO): Simultaneous selection of multiple features with their detection parameters , 2009, CVPR.

[29] Nikos Paragios,et al. Segmentation of building facades using procedural shape priors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30] Hao Su,et al. Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[31] Tomás Pajdla,et al. Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[32] Alexei A. Efros,et al. Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[33] CurlessBrian,et al. Candid portrait selection from video , 2011 .

[34] Xin Chen,et al. City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[35] Brian Curless,et al. Candid portrait selection from video , 2011, ACM Trans. Graph..

[36] Svetlana Lazebnik,et al. Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[37] Jan-Michael Frahm,et al. Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[38] T. Pajdla,et al. Building Streetview Datasets for Place Recognition and City Reconstruction , 2011 .

[39] Alexei A. Efros,et al. Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[40] Yong Jae Lee,et al. Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time , 2013, 2013 IEEE International Conference on Computer Vision.

[41] Alexei A. Efros,et al. Mid-level Visual Element Discovery as Discriminative Mode Seeking , 2013, NIPS.