Experimental Comparison of Semantic Word Clouds

We study the problem of computing semantics-preserving word clouds in which semantically related words are close to each other. We implement three earlier algorithms for creating word clouds and three new ones. We define several metrics for quantitative evaluation of the resulting layouts. Then the algorithms are compared according to these metrics, using two data sets of documents from Wikipedia and research papers. We show that two of our new algorithms outperform all the others by placing many more pairs of related words so that their bounding boxes are adjacent. Moreover, this improvement is not achieved at the expense of significantly worsened measurements for the other metrics.

[1]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[2]  Suresh Venkatasubramanian,et al.  Rectangular layouts and contact graphs , 2006, TALG.

[3]  Bongshin Lee,et al.  ManiWordle: Providing Flexible Control over Wordle , 2010, IEEE Transactions on Visualization and Computer Graphics.

[4]  Manfred Tscheligi,et al.  Semantically structured tag clouds: an empirical evaluation of clustered presentation approaches , 2009, CHI.

[5]  Martin Wattenberg,et al.  Participatory Visualization with Wordle , 2009, IEEE Transactions on Visualization and Computer Graphics.

[6]  Alan J. Dix,et al.  Human Aspects of Visualization , 2009, HCIV.

[7]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[8]  Ji Wang,et al.  Clustered Layout Word Cloud for User Generated Online Reviews , 2012 .

[9]  Ulrik Brandes,et al.  Organizing Search Results with a Reference Map , 2012, IEEE Transactions on Visualization and Computer Graphics.

[10]  Martin Dyer,et al.  Analysis of heuristics for finding a maximum weight planar subgraph , 1985 .

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Alexander Wolff,et al.  Improved Approximation Algorithms 1 for Semantic Word Clouds , 2013 .

[13]  Furu Wei,et al.  Context preserving dynamic word cloud visualization , 2010, 2010 IEEE Pacific Visualization Symposium (PacificVis).

[14]  Martin Wattenberg,et al.  ManyEyes: a Site for Visualization at Internet Scale , 2007, IEEE Transactions on Visualization and Computer Graphics.

[15]  Eugene L. Lawler,et al.  Fast approximation algorithms for knapsack problems , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[16]  Alexander Wolff,et al.  Semantic Word Cloud Representations: Hardness and Approximation Algorithms , 2013, LATIN.

[17]  Stefan Felsner,et al.  Rectangle and Square Representations of Planar Graphs , 2013 .

[18]  Martin R. Gibbs,et al.  Mediating intimacy: designing technologies to support strong-tie relationships , 2005, CHI.

[19]  Kwan-Liu Ma,et al.  Semantic‐Preserving Word Clouds by Seam Carving , 2011, Comput. Graph. Forum.

[20]  Alfredo Viola,et al.  LATIN 2014: Theoretical Informatics , 2014, Lecture Notes in Computer Science.

[21]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[22]  Manfred Tscheligi,et al.  Comparing Different Layouts of Tag Clouds: Findings on Visual Perception , 2009, HCIV.

[23]  Hang Li,et al.  Word Clustering and Disambiguation Based on Co-occurrence Data , 1998, COLING.