Optimizing Hierarchical Visualizations with the Minimum Description Length Principle

In this paper we examine how the Minimum Description Length (MDL) principle can be used to efficiently select aggregated views of hierarchical datasets that feature a good balance between clutter and information. We present MDL formulae for generating uneven tree cuts tailored to treemap and sunburst diagrams, taking into account the available display space and information content of the data. We present the results of a proof-of-concept implementation. In addition, we demonstrate how such tree cuts can be used to enhance drill-down interaction in hierarchical visualizations by implementing our approach in an existing visualization tool. Validation is done with the feature congestion measure of clutter in views of a subset of the current DMOZ web directory, which contains nearly half million categories. The results show that MDL views achieve near constant clutter level across display resolutions. We also present the results of a crowdsourced user study where participants were asked to find targets in views of DMOZ generated by our approach and a set of baseline aggregation methods. The results suggest that, in some conditions, participants are able to locate targets (in particular, outliers) faster using the proposed approach.

[1]  M. Sheelagh T. Carpendale,et al.  DocuBurst: Visualizing Document Content using Language Structure , 2009, Comput. Graph. Forum.

[2]  James Abello,et al.  ASK-GraphView: A Large Scale Graph Visualization System , 2006, IEEE Transactions on Visualization and Computer Graphics.

[3]  Thomas C. M. Lee,et al.  An Introduction to Coding Theory and the Two‐Part Minimum Description Length Principle , 2001 .

[4]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[5]  Derek Greene,et al.  Deriving Insights from National Happiness Indices , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[6]  Hang Li,et al.  Generalizing Case Frames Using a Thesaurus and the MDL Principle , 1995, CL.

[7]  Matthew O. Ward,et al.  Measuring Data Abstraction Quality in Multiresolution Visualizations , 2006, IEEE Transactions on Visualization and Computer Graphics.

[8]  Andreas Wagner,et al.  Enriching a lexical semantic net with selectional preferences by means of statistical corpus analysis , 2000, ECAI Workshop on Ontology Learning.

[9]  Daniel A. Keim,et al.  Challenges in Visual Data Analysis , 2006, Tenth International Conference on Information Visualisation (IV'06).

[10]  F. Töpfer,et al.  The Principles of Selection , 1966 .

[11]  Ruth Rosenholtz,et al.  Feature congestion: A measure of visual clutter , 2010 .

[12]  Jean-Marc Vincent,et al.  Building Optimal Macroscopic Representations of Complex Multi-agent Systems - Application to the Spatial and Temporal Analysis of International Relations Through News Aggregation , 2014, Trans. Comput. Collect. Intell..

[13]  Natalia Adrienko,et al.  Spatial Generalization and Aggregation of Massive Movement Data , 2011 .

[14]  Yuanzhen Li,et al.  Measuring visual clutter. , 2007, Journal of vision.

[15]  Thomas C.M. Lee A Minimum Description Length-Based Image Segmentation Procedure, and its Comparison with a Cross-Validation-Based Segmentation Procedure , 2000 .

[16]  Michael Stonebraker,et al.  Constant density visualizations of non-uniform distributions of data , 1998, UIST '98.

[17]  Lucas Mello Schnorr,et al.  Evaluating Trace Aggregation Through Entropy Measures for Optimal Performance Visualization of Large Distributed Systems , 2013 .

[18]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[19]  Xiaotong Liu,et al.  ViSizer: A Visualization Resizing Framework , 2013, IEEE Transactions on Visualization and Computer Graphics.

[20]  Gennady L. Andrienko,et al.  Spatial Generalization and Aggregation of Massive Movement Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[21]  Tamara Munzner,et al.  GrouseFlocks: Steerable Exploration of Graph Hierarchy Space , 2008, IEEE Transactions on Visualization and Computer Graphics.

[22]  David Whitney,et al.  How Capacity Limits of Attention Influence Information Visualization Effectiveness , 2012, IEEE Transactions on Visualization and Computer Graphics.

[23]  Alan J. Dix,et al.  A Taxonomy of Clutter Reduction for Information Visualisation , 2007, IEEE Transactions on Visualization and Computer Graphics.

[24]  U. Neisser VISUAL SEARCH. , 1964, Scientific American.

[25]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[26]  Danai Koutra,et al.  Summarizing and understanding large graphs , 2015, Stat. Anal. Data Min..

[27]  D. Burr,et al.  Visual Clutter Causes High-Magnitude Errors , 2006, PLoS biology.

[28]  Chris North,et al.  The Perceptual Scalability of Visualization , 2006, IEEE Transactions on Visualization and Computer Graphics.

[29]  Mei C. Chuah,et al.  Dynamic aggregation with circular visual designs , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[30]  Jean-Daniel Fekete,et al.  Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[31]  Derek Greene,et al.  ThemeCrowds: multiresolution summaries of twitter usage , 2011, SMUC '11.

[32]  Tamara Munzner,et al.  Visualization analysis & design , 2015 .

[33]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[34]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .