Learning to tag

Social tagging provides valuable and crucial information for large-scale web image retrieval. It is ontology-free and easy to obtain; however, irrelevant tags frequently appear, and users typically will not tag all semantic objects in the image, which is also called semantic loss. To avoid noises and compensate for the semantic loss, tag recommendation is proposed in literature. However, current recommendation simply ranks the related tags based on the single modality of tag co-occurrence on the whole dataset, which ignores other modalities, such as visual correlation. This paper proposes a multi-modality recommendation based on both tag and visual correlation, and formulates the tag recommendation as a learning problem. Each modality is used to generate a ranking feature, and Rankboost algorithm is applied to learn an optimal combination of these ranking features from different modalities. Experiments on Flickr data demonstrate the effectiveness of this learning-based multi-modality recommendation strategy.

[1]  Fatos T. Yarman-Vural,et al.  Automatic Image Annotation by Ensemble of Visual Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Jun'ichi Tatemura,et al.  Supporting OLAP operations over imperfectly integrated taxonomies , 2008, SIGMOD Conference.

[3]  Qi Tian,et al.  Visual Synset: Towards a higher-level visual representation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Ziv Bar-Yossef,et al.  Random sampling from a search engine's index , 2006, WWW '06.

[5]  Bin Wang,et al.  Dual cross-media relevance model for image annotation , 2007, ACM Multimedia.

[6]  John Riedl,et al.  tagging, communities, vocabulary, evolution , 2006, CSCW '06.

[7]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[8]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[9]  Rong Yan,et al.  Query expansion using probabilistic local feedback with application to multimedia retrieval , 2007, CIKM '07.

[10]  Evaggelia Pitoura,et al.  Query workload-aware overlay construction using histograms , 2005, CIKM '05.

[11]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[12]  Changhu Wang,et al.  Content-Based Image Annotation Refinement , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Mor Naaman,et al.  ZoneTag's Collaborative Tag Suggestions: What is This Person Doing in My Phone? , 2008, IEEE MultiMedia.

[14]  Yolanda Gil,et al.  Incremental formalization of document annotations through ontology-based paraphrasing , 2004, WWW '04.

[15]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[16]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[17]  Wei-Ying Ma,et al.  Bipartite graph reinforcement model for web image annotation , 2007, ACM Multimedia.

[18]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[19]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[20]  Nenghai Yu,et al.  Visual language modeling for image classification , 2007, MIR '07.

[21]  Susanne Boll,et al.  Semantics, content, and structure of many for the creation of personal photo albums , 2007, ACM Multimedia.

[22]  Nicu Sebe,et al.  Context-Based Object-Class Recognition and Retrieval by Generalized Correlograms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Nicu Sebe,et al.  Distance Learning for Similarity Estimation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[25]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.