Tag Similarity in Folksonomies

Folksonomies - collections of user-contributed tags, proved to be efficient in reducing the inherent semantic gap when retrieving web contents. To get best use of folksonomies, tag clustering was proposed to address the problems implied by free-style user tagging, such as lexical variations, tag split, multilingualism, etc. In this paper, we propose a novel approach for identifying similar tags in folksonomies. It is based on the idea that in folksonomies, the most frequent tags can be used to identify groups of semantically related tags. For this purpose, frequent tags are identified and their co-occurrence statistics are used to create a probability distribution for each tag. After that, the frequent tags are clustered based on the distance between their co-occurrence probability distributions. Next, probability distributions for the less frequent tags are generated based on the co-occurrence with the clusters of most frequent tags. Finally, similar tags are identified by calculating the distance between the corresponding probability distributions. To that end, we propose an extension for Jensen-Shannon Divergence which is sensitive for the size of the sample from which the co-occurrence probability distributions are calculated. We evaluated our approach by applying it on folksonomies obtained from Flickr. Additionally, we compared our results to that which were produced by a traditional method for tag clustering. The adversary method identifies similar tags by calculating the cosine similarity between the co-occurrence vectors of the tags. The evaluation shows promising results and emphasizes the advantage of our approach.

[1]  Christopher H. Brooks,et al.  Improved annotation of the blogosphere via autotagging and hierarchical clustering , 2006, WWW '06.

[2]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[3]  Tereza Iofciu,et al.  Finding Communities of Practice from User Profiles Based on Folksonomies , 2006, EC-TEL Workshops.

[4]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[5]  Yiannis Kompatsiaris,et al.  A Graph-Based Clustering Scheme for Identifying Related Tags in Folksonomies , 2010, DaWak.

[6]  Enrico Motta,et al.  Integrating Folksonomies with the Semantic Web , 2007, ESWC.

[7]  Harald Kosch,et al.  Geo-based automatic image annotation , 2012, ICMR '12.

[8]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[9]  Edwin Simpson,et al.  Clustering Tags in Enterprise and Web Folksonomies , 2021, ICWSM.

[10]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[11]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[12]  Robin Burke,et al.  Personalization in Folksonomies Based on Tag Clustering , 2008 .

[13]  Grigory Begelman,et al.  Automated Tag Clustering: Improving search and exploration in the tag space , 2006 .

[14]  Jakob Voß,et al.  Tagging, Folksonomy & Co - Renaissance of Manual Indexing? , 2007, ArXiv.

[15]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[16]  Damir Boras,et al.  Comparing measures of semantic similarity , 2008, ITI 2008 - 30th International Conference on Information Technology Interfaces.

[17]  Bamshad Mobasher,et al.  Personalizing Navigation in Folksonomies Using Hierarchical Tag Clustering , 2008, DaWaK.