Learning salient visual word for scalable mobile image retrieval

Owing to the portable and excellent phone camera, people now prefer to take photos and share them in social networks with their friends. If a user wants to obtain relevant information about an image, content based image retrieval method can be utilized. Taking the limited bandwidth and instability of wireless channel into account, in this paper we propose an effective scalable mobile image retrieval approach by exploiting the advantage of mobile end that people usually take multiple photos of an object in different viewpoints and focuses. The proposed algorithm first determines the truly relevant photos according to visual similarity in mobile end, then learns salient visual words by exploring saliency from these relevant images, and finally determines the contribution order of salient visual words to carry out scalable retrieval. Moreover, to improve the retrieval performance, soft spatial verification is proposed to re-rank the results. Compared to the existing approaches of mobile image retrieval, our approach transmits less data and reduces the computational cost of spatial verification. Most importantly, when the bandwidth is limited, we can transmit only a part of features according their contributions to retrieval. Experimental results show the effectiveness of the proposed approach. Propose a salient feature learning approach for mobile end image retrieval.The algorithm makes full use of the multiphotos taken at mobile end to extract saliency.Determine the contribution of salient feature to image retrieval.A scalable image retrieval approach with extremely low bit rate.

[1]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[2]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[3]  Zhiwei Li,et al.  Contextual synonym dictionary for visual object retrieval , 2011, ACM Multimedia.

[4]  Bo Xu,et al.  Effective near-duplicate image retrieval with image-specific visual phrase selection , 2012, 2012 19th IEEE International Conference on Image Processing.

[5]  Ling Shao,et al.  Feature Learning for Image Classification Via Multiobjective Genetic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Qi Tian,et al.  Task-Dependent Visual-Codebook Compression , 2012, IEEE Transactions on Image Processing.

[7]  Wen Gao,et al.  Towards compact topical descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Tinne Tuytelaars,et al.  Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Aggelos K. Katsaggelos,et al.  Locally adaptive subspace and similarity metric learning for visual data clustering and retrieval , 2008, Comput. Vis. Image Underst..

[10]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[11]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.

[12]  Tao Mei,et al.  Local visual words coding for low bit rate mobile visual search , 2012, ACM Multimedia.

[13]  Wen Gao,et al.  Pruning tree-structured vector quantizer towards low bit rate mobile visual search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, ECCV.

[15]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[16]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[17]  Cordelia Schmid,et al.  Spatial Weighting for Bag-of-Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Cees Snoek,et al.  Landmark image retrieval using visual synonyms , 2010, ACM Multimedia.

[19]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[20]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Ricardo da Silva Torres,et al.  Visual word spatial arrangement for image retrieval and classification , 2014, Pattern Recognit..

[22]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.

[23]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[24]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ling Shao,et al.  Transfer Learning for Visual Categorization: A Survey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Bernd Girod,et al.  Tree Histogram Coding for Mobile Image Matching , 2009, 2009 Data Compression Conference.

[27]  Qi Tian,et al.  Mobile visual search via hievarchical sparse coding , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[28]  Edward Y. Chang,et al.  Scalable landmark recognition using EXTENT , 2007, Multimedia Tools and Applications.

[29]  Wen Gao,et al.  Learning multiple codebooks for low bit rate mobile visual search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Shuicheng Yan,et al.  Towards efficient sparse coding for scalable image annotation , 2013, ACM Multimedia.

[31]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[32]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[33]  Changsheng Xu,et al.  Interaction Design for Mobile Visual Search , 2013, IEEE Transactions on Multimedia.

[34]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[35]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[37]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  A FischlerMartin,et al.  Random sample consensus , 1981 .

[39]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2011, International Journal of Computer Vision.

[40]  Arnold W. M. Smeulders,et al.  Visual synonyms for landmark image retrieval , 2012, Comput. Vis. Image Underst..

[41]  Yuan Yan Tang,et al.  GPS Estimation for Places of Interest From Social Users' Uploaded Photos , 2013, IEEE Transactions on Multimedia.

[42]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[43]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[44]  Ling Shao,et al.  Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition , 2014, International Journal of Computer Vision.

[45]  Ming Shao,et al.  Learning relative features through adaptive pooling for image classification , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[46]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[47]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[48]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[49]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[50]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[51]  Yang Wang,et al.  Interactive Multimodal Visual Search on Mobile Device , 2013, IEEE Transactions on Multimedia.

[52]  Xueming Qian,et al.  Visual summarization of landmarks via viewpoint modeling , 2012, 2012 19th IEEE International Conference on Image Processing.

[53]  Jianping Fan,et al.  Image collection summarization via dictionary learning for sparse representation , 2013, Pattern Recognit..

[54]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[55]  Yuan Yan Tang,et al.  GPS Estimation from Users' Photos , 2013, MMM.

[56]  Yuan Yan Tang,et al.  Landmark Summarization With Diverse Viewpoints , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[57]  Changsheng Xu,et al.  Multimodal Spatio-Temporal Theme Modeling for Landmark Analysis , 2014, IEEE MultiMedia.