Mining flickr landmarks by modeling reconstruction sparsity

In recent years, there have been ever-growing geographical tagged photos on the community Web sites such as Flickr. Discovering touristic landmarks from these photos can help us to make better sense of our visual world. In this article, we report our work on mining landmarks from geotagged Flickr photos for city scene summarization and touristic recommendations. We begin by exploring the geographical and visual statistics of the Web users' photographing manner, based on which we conduct landmark mining in two steps: First, we propose to partition each city into geographical regions based on spectral clustering over the geotags of Flickr photos. Second, in each landmark region, we present a representative photo mining scheme based on sparse representation. Our main idea is to regard the landmark mining problem as a process to find photos whose visual signatures can be reconstructed using other photos of this landmark region with a minimal coding length. This sparse reconstruction scheme offers a general perspective to mine the representative photos. Indeed, by simplifying the data correlation constraints in our scheme, several previous works in representative photo discovery and landmark mining can be derived. Finally, we introduce a Hyperlink-Induced Topic Search model to refine our landmark ranking, which incorporates the community knowledge to simulate the landmark ranking problem as a dynamic page ranking problem. We have deployed our proposed landmark mining framework on a city scene summarization and navigation system, which works on one million geotagged Flickr photos coming from twenty worldwide metropolises. We have also quantitatively compared our scheme with several state-of-the-art works.

[1]  Mor Naaman,et al.  World explorer: visualizing aggregate data from unstructured text in geo-referenced collections , 2007, JCDL '07.

[2]  Dongyeop Kang,et al.  Multi-view Landmark Recognition in Large-scale Image Collections , 2011 .

[3]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4]  Yaakov Tsaig,et al.  Fast Solution of $\ell _{1}$ -Norm Minimization Problems When the Solution May Be Sparse , 2008, IEEE Transactions on Information Theory.

[5]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[6]  Wolfram Burgard,et al.  Estimating landmark locations from geo-referenced photographs , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Carlo Torniai,et al.  Sharing, Discovering and Browsing Geotagged Pictures on the World Wide Web , 2007, The Geospatial Web.

[8]  John Wright,et al.  Segmentation of multivariate mixed data via lossy coding and compression , 2007, Electronic Imaging.

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Wei-Ying Ma,et al.  VirtualTour: an online travel assistant based on high quality images , 2006, MM '06.

[12]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[13]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[14]  Jiebo Luo,et al.  Inferring photographic location using geotagged web images , 2010, Multimedia Tools and Applications.

[15]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Yiannis Kompatsiaris,et al.  Cluster-Based Landmark and Event Detection for Tagged Photo Collections , 2011, IEEE MultiMedia.

[17]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[18]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[19]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[20]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  John Wright,et al.  Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[23]  Tat-Seng Chua,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, CVPR.

[24]  Ulrike von Luxburg,et al.  Influence of graph construction on graph-based clustering measures , 2008, NIPS.

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[26]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[27]  JiRongrong,et al.  Mining flickr landmarks by modeling reconstruction sparsity , 2011 .

[28]  Walaa M. Sheta,et al.  Automatic landmark identification in large virtual environment: a spatial data mining approach , 2005, Ninth International Conference on Information Visualisation (IV'05).

[29]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, ECCV.

[30]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[32]  Keiji Yanai,et al.  Mining Regional Representative Photos from Consumer-Generated Geotagged Photos , 2010, Handbook of Social Network Technologies.

[33]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[36]  Yue Gao,et al.  W2Go: a travel guidance system by automatic landmark ranking , 2010, ACM Multimedia.

[37]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[38]  Thomas S. Huang,et al.  Image super-resolution as sparse representation of raw image patches , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[40]  Newton Lee,et al.  ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP) , 2007, CIE.

[41]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[42]  Xing Xie,et al.  Mining city landmarks from blogs by graph modeling , 2009, ACM Multimedia.

[43]  Steffen Staab,et al.  Exploiting Flickr Tags and Groups for Finding Landmark Photos , 2009, ECIR.