Reorganizing clouds: A study on tag clustering and evaluation

Highlights? Representing tags by co-occurrences yields more accurate clusters than representing them by content. ? Representations based on co-occurrences significantly reduce the computational cost of the process. ? Some of the studied functions to weight co-occurrences outperform approaches used in earlier works. ? Our dataset presents a reliable solution to build approaches for finding tag relations while evaluating on sound quantitative criteria. ? Language modeling techniques help discover that the usage of some tags is biased to unexpected meanings. Finding and visualizing semantic relations among tags within a tag cloud enhances user experience, particularly regarding access to and retrieval of web pages on social tagging systems. Several approaches have been proposed to visualize tag relations in these systems. However, results of previous research rely on qualitative evaluation methods, and do not provide robust and sound comparison criteria. In order to allow quantitative evaluation we present a benchmark social tagging dataset, where a subset of 140 tags from a well-known social bookmarking site, delicious, have been manually categorized according to the open directory project (ODP). The manual categorization is utilized as a ground truth that enables quantitative evaluation providing a way of inferring the best of different clustering approaches. With this dataset we also explore different tag representation approaches to present a reorganized tag cloud by using self organizing maps. In addition, we present an approach to enrich the resultant tag cloud with the most characteristic terms for each tag and group of tags, making possible a further filtered navigation, both by tag and document content, and easing a deeper qualitative evaluation of the clusters.

[1]  Jieh-Haur Chen Developing SFNN models to predict financial distress of construction companies , 2012, Expert Syst. Appl..

[2]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[3]  Diego Andina de la Fuente,et al.  Unsupervised system to classify SO2 pollutant concentrations in Salamanca, Mexico , 2012 .

[4]  Kyong Joo Oh,et al.  The collaborative filtering recommendation based on SOM cluster-indexing CBR , 2003, Expert Syst. Appl..

[5]  Lars Schmidt-Thieme,et al.  Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007 , 2008, GfKl.

[6]  Andreas Rauber,et al.  The growing hierarchical self-organizing map , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[7]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[8]  Davide Eynard,et al.  An integrated approach to discover tag semantics , 2011, SAC.

[9]  Asunción Gómez-Pérez,et al.  Review of the state of the art: discovering and associating semantics to tags in folksonomies , 2012, The Knowledge Engineering Review.

[10]  Myra Spiliopoulou,et al.  Tag-Aware Spectral Clustering of Music Items , 2009, ISMIR.

[11]  Yusef Hassan-Montero,et al.  Improving Tag-Clouds as Visual Information Retrieval Interfaces , 2024, 2401.04947.

[12]  Andrea Leganza Approved for External Publication , 2005 .

[13]  Grigory Begelman,et al.  Automated Tag Clustering: Improving search and exploration in the tag space , 2006 .

[14]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[15]  Arkaitz Zubiaga,et al.  Getting the most out of social annotations for web page classification , 2009, DocEng '09.

[16]  Enrico Motta,et al.  Integrating Folksonomies with the Semantic Web , 2007, ESWC.

[17]  Christopher H. Brooks,et al.  Improved annotation of the blogosphere via autotagging and hierarchical clustering , 2006, WWW '06.

[18]  Qiudan Li,et al.  A recommender system based on tag and time information for social tagging systems , 2011, Expert Syst. Appl..

[19]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[20]  Yong Yu,et al.  Exploring social annotations for the semantic web , 2006, WWW '06.

[21]  Nigel Shadbolt,et al.  Tag Meaning Disambiguation through Analysis of Tripartite Structure of Folksonomies , 2007, 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops.

[22]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[23]  John Riedl,et al.  tagging, communities, vocabulary, evolution , 2006, CSCW '06.

[24]  Andreas Butz,et al.  TagClusters: Semantic Aggregation of Collaborative Tags beyond TagClouds , 2009, Smart Graphics.

[25]  Ciro Cattuto,et al.  Evaluating similarity measures for emergent semantics of social tagging , 2009, WWW '09.

[26]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[27]  Hujun Yin,et al.  Document Clustering Using the 1 + 1 Dimensional Self-Organising Map , 2002, IDEAL.

[28]  Ciro Cattuto,et al.  Semantic Grounding of Tag Relatedness in Social Bookmarking Systems , 2008, SEMWEB.

[29]  Arnulfo P. Azcarraga,et al.  Using Structured Self-Organizing Maps in News Integration Websites , 2002 .

[30]  P. Schmitz,et al.  Inducing Ontology from Flickr Tags , 2006 .

[31]  Chengjie Sun,et al.  A language model approach for tag recommendation , 2011, Expert Syst. Appl..

[32]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[33]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[34]  Edwin Simpson,et al.  Clustering Tags in Enterprise and Web Folksonomies , 2021, ICWSM.

[35]  Qinghua Zhu,et al.  The Determination of Semantic Dimension in Social Tagging System Based on SOM Model , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[36]  Enrico Motta,et al.  Bridging the gap between folksonomies and the semantic web: an experience report , 2007 .

[37]  Georgia Koutrika,et al.  Can social bookmarking improve web search? , 2008, WSDM '08.

[38]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[39]  Sebastian Risi,et al.  Visualization and Clustering of Tagged Music Data , 2007, GfKl.

[40]  Arkaitz Zubiaga,et al.  Content-Based Clustering for Tag Cloud Visualization , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[41]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[42]  Georgia Koutrika,et al.  On the selection of tags for tag clouds , 2011, WSDM '11.

[43]  Manfred Tscheligi,et al.  Comparing Different Layouts of Tag Clouds: Findings on Visual Perception , 2009, HCIV.

[44]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[45]  C. J. van Rijsbergen,et al.  FOUNDATION OF EVALUATION , 1974 .

[46]  Flavius Frasincar,et al.  A semantic clustering-based approach for searching and browsing tag spaces , 2011, SAC.