Landmark Image Retrieval by Jointing Feature Refinement and Multimodal Classifier Learning

Landmark retrieval is to return a set of images with their landmarks similar to those of the query images. Existing studies on landmark retrieval focus on exploiting the geometries of landmarks for visual similarity matches. However, the visual content of social images is of large diversity in many landmarks, and also some images share common patterns over different landmarks. On the other side, it has been observed that social images usually contain multimodal contents, i.e., visual content and text tags, and each landmark has the unique characteristic of both visual content and text content. Therefore, the approaches based on similarity matching may not be effective in this environment. In this paper, we investigate whether the geographical correlation among the visual content and the text content could be exploited for landmark retrieval. In particular, we propose an effective multimodal landmark classification paradigm to leverage the multimodal contents of social image for landmark retrieval, which integrates feature refinement and landmark classifier with multimodal contents by a joint model. The geo-tagged images are automatically labeled for classifier learning. Visual features are refined based on low rank matrix recovery, and multimodal classification combined with group sparse is learned from the automatically labeled images. Finally, candidate images are ranked by combining classification result and semantic consistence measuring between the visual content and text content. Experiments on real-world datasets demonstrate the superiority of the proposed approach as compared to existing methods.

[1]  Hareton K. N. Leung,et al.  Hybrid $k$ -Nearest Neighbor Classifier , 2016, IEEE Transactions on Cybernetics.

[2]  Zhiwen Yu,et al.  Hybrid Adaptive Classifier Ensemble , 2015, IEEE Transactions on Cybernetics.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Jane You,et al.  Distribution-Based Cluster Structure Selection , 2017, IEEE Transactions on Cybernetics.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Jing-Yu Yang,et al.  Content-based image retrieval using color difference histogram , 2013, Pattern Recognit..

[8]  Jing Ren,et al.  Building a Large Scale Test Collection for Effective Benchmarking of Mobile Landmark Search , 2013, MMM.

[9]  Tieniu Tan,et al.  l2, 1 Regularized correntropy for robust feature selection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[12]  Bart Thomee,et al.  Working Notes for the Placing Task at MediaEval 2013 , 2013, MediaEval.

[13]  Juan José Rodríguez Diez,et al.  Random Subspace Ensembles for fMRI Classification , 2010, IEEE Transactions on Medical Imaging.

[14]  Jaeyoung Choi,et al.  Video2GPS: a demo of multimodal location estimation on flickr videos , 2011, MM '11.

[15]  Ricardo da Silva Torres,et al.  Learning to rank for content-based image retrieval , 2010, MIR '10.

[16]  Kristen Grauman,et al.  Clues from the beaten path: Location estimation with bursty sequences of tourist photos , 2011, CVPR 2011.

[17]  Xuelong Li,et al.  When Location Meets Social Multimedia , 2015, ACM Transactions on Intelligent Systems and Technology.

[18]  Jane You,et al.  A New Kind of Nonparametric Test for Statistical Comparison of Multiple Classifiers Over Multiple Datasets , 2017, IEEE Transactions on Cybernetics.

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Trevor Darrell,et al.  Multimodal location estimation , 2010, ACM Multimedia.

[22]  Ludmila I. Kuncheva,et al.  A Bound on Kappa-Error Diagrams for Analysis of Classifier Ensembles , 2013, IEEE Transactions on Knowledge and Data Engineering.

[23]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[24]  Nicolás García-Pedrajas,et al.  Constructing Ensembles of Classifiers by Means of Weighted Instance Selection , 2009, IEEE Transactions on Neural Networks.

[25]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[26]  Zhiwen Yu,et al.  Adaptive Noise Immune Cluster Ensemble Using Affinity Propagation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[27]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[28]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Steven Schockaert,et al.  Spatially Aware Term Selection for Geotagging , 2014, IEEE Transactions on Knowledge and Data Engineering.

[30]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[31]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[32]  Rongrong Ji,et al.  Nonnegative Spectral Clustering with Discriminative Regularization , 2011, AAAI.

[33]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[34]  Zhihua Xia,et al.  A Privacy-Preserving and Copy-Deterrence Content-Based Image Retrieval Scheme in Cloud Computing , 2016, IEEE Transactions on Information Forensics and Security.

[35]  Mila Nikolova,et al.  Analysis of Half-Quadratic Minimization Methods for Signal and Image Recovery , 2005, SIAM J. Sci. Comput..

[36]  Tao Mei,et al.  Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing , 2012, ACM Multimedia.

[37]  Zhiwen Yu,et al.  Transductive multi-label ensemble classification for protein function prediction , 2012, KDD.

[38]  Changsheng Xu,et al.  Enhanced 3-D Modeling for Landmark Image Classification , 2012, IEEE Transactions on Multimedia.

[39]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Zi Huang,et al.  Automatic tagging by exploring tag information capability and correlation , 2011, World Wide Web.

[41]  Harald Kosch,et al.  Geo-based automatic image annotation , 2012, ICMR '12.

[42]  Steven Schockaert,et al.  Finding locations of flickr resources using language models and similarity search , 2011, ICMR.

[43]  Lin Wu,et al.  Effective Multi-Query Expansions: Robust Landmark Retrieval , 2015, ACM Multimedia.

[44]  Sheng Tang,et al.  Localized Multiple Kernel Learning for Realistic Human Action Recognition in Videos , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[45]  Jiebo Luo,et al.  Geotagging in multimedia and computer vision—a survey , 2010, Multimedia Tools and Applications.

[46]  Hossein Nezamabadi-pour,et al.  Image indexing and retrieval in JPEG compressed domain based on vector quantization , 2013, Math. Comput. Model..

[47]  Feiping Nie,et al.  Low-Rank Matrix Recovery via Efficient Schatten p-Norm Minimization , 2012, AAAI.

[48]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[49]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[50]  Wen Gao,et al.  Learning Compact Visual Descriptor for Low Bit Rate Mobile Landmark Search , 2011, IJCAI.

[51]  M. Esmel ElAlami,et al.  A novel image retrieval model based on the most relevant features , 2011, Knowl. Based Syst..

[52]  Ricardo da Silva Torres,et al.  Image Re-ranking and Rank Aggregation Based on Similarity of Ranked Lists , 2011, CAIP.

[53]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[54]  Martha Larson,et al.  Global-Scale Location Prediction for Social Images Using Geo-Visual Ranking , 2015, IEEE Transactions on Multimedia.

[55]  Zi Huang,et al.  Spatial-aware Multimodal Location Estimation for Social Images , 2015, ACM Multimedia.

[56]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[57]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Pavel Serdyukov,et al.  Placing flickr photos on a map , 2009, SIGIR.

[60]  Lei Zhang,et al.  Image retrieval based on micro-structure descriptor , 2011, Pattern Recognit..

[61]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[63]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Qi Tian,et al.  What are the high-level concepts with small semantic gaps? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[66]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[67]  Changsheng Xu,et al.  Discovering Geo-Informative Attributes for Location Recognition and Exploration , 2014, TOMM.

[68]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..

[69]  Xingming Sun,et al.  Effective and Efficient Global Context Verification for Image Copy Detection , 2017, IEEE Transactions on Information Forensics and Security.

[70]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[71]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[73]  Ji Wan,et al.  Online Learning to Rank for Content-Based Image Retrieval , 2015, IJCAI.

[74]  Kristen Grauman,et al.  Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search , 2011, International Journal of Computer Vision.

[75]  Hai Jin,et al.  Content-Based Visual Landmark Search via Multimodal Hypergraph Learning , 2015, IEEE Transactions on Cybernetics.

[76]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .