Content-Irrelevant Tag Cleansing via Bi-Layer Clustering and Peer Cooperation

User-provided tags for social images have facilitated many fields, such as social image organization, summarization and retrieval. Since the users utilize their own knowledge and personalized language to describe the visual content of social images, these social tags are too imprecise and ambiguous to exploit the social image tagging. In this paper, we discover the content-similar images (peers) and leverage the relationships among these images (peer cooperation) to handle the problem of content-irrelevant tags. A bi-layer clustering framework for discovering content-similar images is proposed to divide image collection into different groups, and the tags of peers in these groups are cleaned jointly based on tag statistics and relevance. The relevance of tags measured by Google Distance is used to generate the first-layer clustering and then the bi-modality similarity of images is used to perform the second-layer clustering. Based on the bi-layer clustering, we utilize peers in a group to identify their content-irrelevant tags. Finally, an extended Fisher’s criterion is proposed to decide the proper number of content-irrelevant tags. To verify the effectiveness of our proposed technique, we conduct the experiments on the social images of Flickr and the standard benchmark. The comparison experiments show that our proposed algorithm achieves positive results for tag cleansing and image retrieval.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[3]  Xian-Sheng Hua,et al.  Finding image exemplars using fast sparse affinity propagation , 2008, ACM Multimedia.

[4]  Wesley De Neve,et al.  MAP-based image tag recommendation using a visual folksonomy , 2010, Pattern Recognit. Lett..

[5]  Jianping Fan,et al.  Social Tag Enrichment via Automatic Abstract Tag Refinement , 2012, PCM.

[6]  G. Qiu Indexing chromatic and achromatic patterns for content-based colour image retrieval , 2002, Pattern Recognit..

[7]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[8]  Ingmar Weber,et al.  Personalized, interactive tag recommendation for flickr , 2008, RecSys '08.

[9]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[10]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[11]  Jianping Fan,et al.  Leveraging loosely-tagged images and inter-object correlations for tag recommendation , 2010, ACM Multimedia.

[12]  Rong Jin,et al.  Automatic image annotation , 2007 .

[13]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[14]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[15]  Jon Atle Gulla,et al.  Mining tag similarity in folksonomies , 2011, SMUC '11.

[16]  Ramesh C. Jain,et al.  Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images , 2011, TIST.

[17]  Wesley De Neve,et al.  Tag refinement in an image folksonomy using visual similarity and tag co-occurrence statistics , 2010, Signal Process. Image Commun..

[18]  Shuicheng Yan,et al.  Image tag refinement towards low-rank, content-tag prior and error sparsity , 2010, ACM Multimedia.

[19]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[20]  Nenghai Yu,et al.  Learning to tag , 2009, WWW '09.

[21]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[22]  Jianping Fan,et al.  Harvesting large-scale weakly-tagged image databases from the web , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  David L. Olson,et al.  Advanced Data Mining Techniques , 2008 .

[24]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[25]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[26]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[27]  Ciro Cattuto,et al.  Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems , 2008, LWA.

[28]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[29]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[30]  Jing Liu,et al.  Sparse semantic metric learning for image retrieval , 2013, Multimedia Systems.

[31]  Changsheng Xu,et al.  User-Aware Image Tag Refinement via Ternary Semantic Analysis , 2012, IEEE Transactions on Multimedia.

[32]  Lei Wu,et al.  Tag Completion for Image Retrieval , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.