Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems

Social bookmarking systems allow users to organise collections of resources on the Web in a collaborative fashion. The increasing popularity of these systems as well as first insights into their emergent semantics have made them relevant to disciplines like knowledge extraction and ontology learning. The problem of devising methods to measure the semantic relatedness between tags and characterizing it semantically is still largely open. Here we analyze three measures of tag relatedness: tag co-occurrence, cosine similarity of co-occurrence distributions, and FolkRank, an adaptation of the PageRank algorithm to folksonomies. Each measure is computed on tags from a large-scale dataset crawled from the social bookmarking system del.icio.us. To provide a semantic grounding of our findings, a connection to WordNet (a semantic lexicon for the English language) is established by mapping tags into synonym sets of WordNet, and applying there well-known metrics of semantic similarity. Our results clearly expose different characteristics of the selected measures of relatedness, making them applicable to different subtasks of knowledge extraction such as synonym detection or discovery of concept hierarchies.

[1]  Vittorio Loreto,et al.  Network properties of folksonomies , 2007, AI Commun..

[2]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[3]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[5]  P. Schmitz,et al.  Inducing Ontology from Flickr Tags , 2006 .

[6]  Adam Mathes,et al.  Folksonomies-Cooperative Classification and Communication Through Shared Metadata , 2004 .

[7]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[8]  Daniel Chandler,et al.  Semiotics: The Basics , 2001 .

[9]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[10]  Valentin Robu,et al.  The Dynamics and Semantics of Collaborative Tagging , 2006, SAAW@ISWC.

[11]  Hector Garcia-Molina,et al.  Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems , 2006 .

[12]  Rudolf Wille,et al.  A Triadic Approach to Formal Concept Analysis , 1995, ICCS.

[13]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[14]  Vittorio Loreto,et al.  Semiotic dynamics and collaborative tagging , 2006, Proceedings of the National Academy of Sciences.

[15]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[16]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[17]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[18]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[19]  Marcel Ausloos,et al.  Contextualising tags in collaborative tagging systems , 2009, HT '09.

[20]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[21]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[22]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[23]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[24]  Yong Yu,et al.  Emergent Semantics from Folksonomies: A Quantitative Study , 2006, J. Data Semant..

[25]  Peter Mika Ontologies Are Us: A Unified Model of Social Networks and Semantics , 2005, International Semantic Web Conference.