Evaluating similarity measures for emergent semantics of social tagging

Social bookmarking systems are becoming increasingly important data sources for bootstrapping and maintaining Semantic Web applications. Their emergent information structures have become known as folksonomies. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, and which measures are best suited for applications such as community detection, navigation support, semantic search, user profiling and ontology learning. Here we build an evaluation framework to compare various general folksonomy-based similarity measures, which are derived from several established information-theoretic, statistical, and practical measures. Our framework deals generally and symmetrically with users, tags, and resources. For evaluation purposes we focus on similarity between tags and between resources and consider different methods to aggregate annotations across users. After comparing the ability of several tag similarity measures to predict user-created tag relations, we provide an external grounding by user-validated semantic proxies based on WordNet and the Open Directory Project. We also investigate the issue of scalability. We find that mutual information with distributional micro-aggregation across users yields the highest accuracy, but is not scalable; per-user projection with collaborative aggregation provides the best scalable approach via incremental computations. The results are consistent across resource and tag similarity.

[1]  Yusef Hassan-Montero,et al.  Improving Tag-Clouds as Visual Information Retrieval Interfaces , 2024, 2401.04947.

[2]  Christopher H. Brooks,et al.  Improved annotation of the blogosphere via autotagging and hierarchical clustering , 2006, WWW '06.

[3]  Valentin Robu,et al.  The Dynamics and Semantics of Collaborative Tagging , 2006, SAAW@ISWC.

[4]  John C. Paolillo,et al.  The Social Structure of Tagging Internet Video on del.icio.us , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[5]  Tereza Iofciu,et al.  Finding Communities of Practice from User Profiles Based on Folksonomies , 2006, EC-TEL Workshops.

[6]  Filippo Menczer,et al.  Algorithmic detection of semantic similarity , 2005, WWW '05.

[7]  Vittorio Loreto,et al.  Emergent Community Structure in Social Tagging Systems , 2008, Adv. Complex Syst..

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  Judit Bar-Ilan,et al.  Structured versus unstructured tagging: a case study , 2008, Online Inf. Rev..

[10]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[11]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[12]  Filippo Menczer,et al.  GiveALink: mining a semantic network of bookmarks for web search and recommendation , 2005, LinkKDD '05.

[13]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[14]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[15]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[16]  Grigory Begelman,et al.  Automated Tag Clustering: Improving search and exploration in the tag space , 2006 .

[17]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[18]  Peter Mika Ontologies Are Us: A Unified Model of Social Networks and Semantics , 2005, International Semantic Web Conference.

[19]  Harris Wu,et al.  Harvesting social knowledge from folksonomies , 2006, HYPERTEXT '06.

[20]  Rudolf Wille,et al.  A Triadic Approach to Formal Concept Analysis , 1995, ICCS.

[21]  Georgia Koutrika,et al.  Can social bookmarking improve web search? , 2008, WSDM '08.

[22]  Hector Garcia-Molina,et al.  Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems , 2006 .

[23]  Andreas Hotho,et al.  BibSonomy: a social bookmark and publication sharing system , 2006 .

[24]  Filippo Menczer,et al.  Visualizing social links in exploratory search , 2008, HT '08.

[25]  Ciro Cattuto,et al.  Semantic Grounding of Tag Relatedness in Social Bookmarking Systems , 2008, SEMWEB.

[26]  P. Schmitz,et al.  Inducing Ontology from Flickr Tags , 2006 .

[27]  Filippo Menczer,et al.  Algorithmic Computation and Approximation of Semantic Similarity , 2006, World Wide Web.

[28]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[29]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[30]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[31]  Jakob Voß,et al.  Tagging, Folksonomy & Co - Renaissance of Manual Indexing? , 2007, ArXiv.

[32]  Jianchang Mao,et al.  Towards the Semantic Web: Collaborative Tag Suggestions , 2006 .

[33]  Tony Hammond,et al.  Social Bookmarking Tools (I): A General Overview , 2005, D Lib Mag..

[34]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[35]  Sebastiano Vigna,et al.  Do Your Worst to Make the Best: Paradoxical Effects in PageRank Incremental Computations , 2004, WAW.

[36]  Jennifer Widom,et al.  Exploiting hierarchical domain structure to compute similarity , 2003, TOIS.

[37]  Stijn Christiaens,et al.  Metadata Mechanisms: From Ontology to Folksonomy ... and Back , 2006, OTM Workshops.

[38]  Andreas Hotho,et al.  Mining Association Rules in Folksonomies , 2006, Data Science and Classification.

[39]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[40]  Luc Steels,et al.  Integrating Collaborative Tagging and Emergent Semantics for Image Retrieval , 2006 .

[41]  Yong Yu,et al.  Emergent Semantics from Folksonomies: A Quantitative Study , 2006, J. Data Semant..

[42]  Krishna P. Gummadi,et al.  Exploiting Social Networks for Internet Search , 2006, HotNets.

[43]  Andreas Hotho,et al.  Tag Recommendations in Folksonomies , 2007, LWA.

[44]  Tony Hammond,et al.  Social Bookmarking Tools (II): A Case Study - Connotea , 2005, D Lib Mag..

[45]  Andreas Hotho,et al.  Kollaboratives Wissensmanagement , 2006, Semantic Web: Wege zur vernetzten Wissensgesellschaft.

[46]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[47]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[48]  Gilad Mishne,et al.  AutoTag: a collaborative approach to automated tag assignment for weblog posts , 2006, WWW '06.

[49]  Filippo Menczer,et al.  Efficient assembly of social semantic networks , 2008, HT '08.