Rate-Distortion Theory for Clustering in the Perceptual Space

How to extract relevant information from large data sets has become a main challenge in data visualization. Clustering techniques that classify data into groups according to similarity metrics are a suitable strategy to tackle this problem. Generally, these techniques are applied in the data space as an independent step previous to visualization. In this paper, we propose clustering on the perceptual space by maximizing the mutual information between the original data and the final visualization. With this purpose, we present a new information-theoretic framework based on the rate-distortion theory that allows us to achieve a maximally compressed data with a minimal signal distortion. Using this framework, we propose a methodology to design a visualization process that minimizes the information loss during the clustering process. Three application examples of the proposed methodology in different visualization techniques such as scatterplot, parallel coordinates, and summary trees are presented.

[1]  Ben Shneiderman,et al.  Interactively Exploring Hierarchical Clustering Results , 2003 .

[2]  Lars Linsen,et al.  A User-centric Taxonomy for Multidimensional Data Projection Tasks , 2015, IVAPP.

[3]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[4]  Tamara Munzner,et al.  A Taxonomy of Visual Cluster Separation Factors , 2012, Comput. Graph. Forum.

[5]  Min Chen,et al.  An Information-theoretic Framework for Visualization , 2010, IEEE Transactions on Visualization and Computer Graphics.

[6]  Suguru Arimoto,et al.  An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[7]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[8]  S. S. Stevens On the psychophysical law. , 1957, Psychological review.

[9]  Mengchen Liu,et al.  A survey on information visualization: recent advances and challenges , 2014, The Visual Computer.

[10]  Edward M. Reingold,et al.  Tidier Drawings of Trees , 1981, IEEE Transactions on Software Engineering.

[11]  Mateu Sbert,et al.  Information Theory Tools for Image Processing , 2014, Information Theory Tools for Image Processing.

[12]  Jeffrey Heer,et al.  Visual Embedding: A Model for Visualization , 2014, IEEE Computer Graphics and Applications.

[13]  David S. Ebert,et al.  MarketAnalyzer: An Interactive Visual Analytics System for Analyzing Competitive Advantage Using Point of Sale Data , 2012, Comput. Graph. Forum.

[14]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[15]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[16]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[17]  P. Bruneau,et al.  Cluster Sculptor, an interactive visual clustering system , 2015, Neurocomputing.

[18]  Jinwook Seo,et al.  XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data , 2015, BMC Bioinformatics.

[19]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20]  G. Wiselin Jiji,et al.  A Survey on optimization approaches to text document clustering , 2013, ArXiv.

[21]  Carlos Eduardo Scheidegger,et al.  An Algebraic Process for Visualization Design , 2014, IEEE Transactions on Visualization and Computer Graphics.

[22]  Rosane Minghim,et al.  Perception-Based Evaluation of Projection Methods for Multidimensional Data Visualization , 2015, IEEE Transactions on Visualization and Computer Graphics.

[23]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[24]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[25]  Mateu Sbert,et al.  Information Theory Tools for Visualization , 2016 .

[26]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[27]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[28]  Michael S. Bernstein,et al.  Learning Perceptual Kernels for Visualization Design , 2014, IEEE Transactions on Visualization and Computer Graphics.

[29]  Colin Ware,et al.  Visual Thinking for Design , 2008 .

[30]  Martin Graham,et al.  A Survey of Multiple Tree Visualisation , 2010, Inf. Vis..

[31]  Angus Graeme Forbes,et al.  Density-based motion , 2017, Inf. Vis..

[32]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[33]  Howard J. Karloff,et al.  Maximum Entropy Summary Trees , 2013, Comput. Graph. Forum.

[34]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[35]  Reda ElHakim,et al.  Interactive 3D visualization for wireless sensor networks , 2010, The Visual Computer.

[36]  Jörg Sander Density-Based Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[37]  Çagatay Demiralp,et al.  Clustrophile: A Tool for Visual Clustering Analysis , 2017, ArXiv.

[38]  R. Bramon,et al.  An Information‐Theoretic Observation Channel for Volume Visualization , 2013, Comput. Graph. Forum.

[39]  Tobias Schreck,et al.  Visual Cluster Analysis of Trajectory Data with Interactive Kohonen Maps , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[40]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[41]  Manuel Lima,et al.  The Book of Trees: Visualizing Branches of Knowledge , 2014 .

[42]  Shi-Min Hu,et al.  Visual storylines: Semantic visualization of movie sequence , 2012, Comput. Graph..

[43]  Andreas Wierse,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[44]  Tamara Munzner,et al.  Visualization Analysis and Design , 2014, A.K. Peters visualization series.

[45]  Dieter Schmalstieg,et al.  Comparative Analysis of Multidimensional, Quantitative Data , 2010, IEEE Transactions on Visualization and Computer Graphics.