Just-in-time annotation of clusters, outliers, and trends in point-based data visualizations

We introduce the concept of just-in-time descriptive analytics as a novel application of computational and statistical techniques performed at interaction-time to help users easily understand the structure of data as seen in visualizations. Fundamental to just-intime descriptive analytics is (a) identifying visual features, such as clusters, outliers, and trends, user might observe in visualizations automatically, (b) determining the semantics of such features by performing statistical analysis as the user is interacting, and (c) enriching visualizations with annotations that not only describe semantics of visual features but also facilitate interaction to support high-level understanding of data. In this paper, we demonstrate just-in-time descriptive analytics applied to a point-based multi-dimensional visualization technique to identify and describe clusters, outliers, and trends. We argue that it provides a novel user experience of computational techniques working alongside of users allowing them to build faster qualitative mental models of data by demonstrating its application on a few use-cases. Techniques used to facilitate just-in-time descriptive analytics are described in detail along with their runtime performance characteristics. We believe this is just a starting point and much remains to be researched, as we discuss open issues and opportunities in improving accessibility and collaboration.

[1]  Stefan Berchtold,et al.  Similarity clustering of dimensions for an enhanced visualization of multidimensional data , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[2]  Haim Levkowitz,et al.  Least Square Projection: A Fast High-Precision Multidimensional Projection Technique and Its Application to Document Mapping , 2008, IEEE Transactions on Visualization and Computer Graphics.

[3]  Matthew O. Ward,et al.  Visual Hierarchical Dimension Reduction for Exploration of High Dimensional Datasets , 2003, VisSym.

[4]  Mark Gahegan,et al.  Breaking Down Dimensionality : Effective and Efficient Feature Selection for High-Dimensional Clustering , 2003 .

[5]  Enrico Bertini,et al.  Quality Metrics in High-Dimensional Data Visualization: An Overview and Systematization , 2011, IEEE Transactions on Visualization and Computer Graphics.

[6]  Daniel Asimov,et al.  The grand tour: a tool for viewing multidimensional data , 1985 .

[7]  Scott Barlowe,et al.  Click2Annotate: Automated Insight Externalization with rich semantics , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[8]  Kwan-Liu Ma,et al.  StarClass: Interactive Visual Classification using Star Coordinates , 2003, SDM.

[9]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[10]  Arie Rip,et al.  The Computer Revolution in Science: Steps Towards the Realization of Computer-Supported Discovery Environments , 1997, Artif. Intell..

[11]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[12]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[13]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[14]  Ben Shneiderman,et al.  Inventing Discovery Tools: Combining Information Visualization with Data Mining1 , 2001, Inf. Vis..

[15]  Robert L. Grossman,et al.  Graph-Theoretic Scagnostics , 2005, INFOVIS.

[16]  S. Johansson,et al.  Interactive Dimensionality Reduction Through User-defined Combinations of Quality Metrics , 2009, IEEE Transactions on Visualization and Computer Graphics.

[17]  Daniel A. Keim,et al.  Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data , 2010, AVI.

[18]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[19]  Hans-Peter Kriegel,et al.  Towards an Effective Cooperation of the Computer and the User for Classification , 2000, KDD 2000.

[20]  Daniel A. Keim,et al.  Pixnostics: Towards Measuring the Value of Visualization , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[21]  Luis Gustavo Nonato,et al.  Local Affine Multidimensional Projection , 2011, IEEE Transactions on Visualization and Computer Graphics.

[22]  Hans-Peter Kriegel,et al.  Visual classification: an interactive approach to decision tree construction , 1999, KDD '99.

[23]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[24]  Matthew O. Ward,et al.  Clutter Reduction in Multi-Dimensional Data Visualization Using Dimension Reordering , 2004, IEEE Symposium on Information Visualization.

[25]  Matthew O. Ward,et al.  Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[26]  Marc Olano,et al.  Glimmer: Multilevel MDS on the GPU , 2009, IEEE Transactions on Visualization and Computer Graphics.

[27]  Marcus A. Magnor,et al.  Combining automated analysis and visualization techniques for effective exploration of high-dimensional data , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[28]  Andreas Buja,et al.  Grand tour and projection pursuit , 1995 .

[29]  David G. Lowe,et al.  Perceptual Organization and Visual Recognition , 2012 .

[30]  Anthony Chiu Wa Lo,et al.  Cluster validity analysis using subsampling , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[31]  Ben Shneiderman,et al.  A Rank-by-Feature Framework for Unsupervised Multidimensional Data Exploration Using Low Dimensional Projections , 2004, IEEE Symposium on Information Visualization.

[32]  Mihael Ankerst,et al.  Visual Data Mining , 2001, Encyclopedia of GIS.

[33]  Matthew O. Ward,et al.  Managing discoveries in the visual analytics process , 2007, SKDD.

[34]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[35]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[36]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[37]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[38]  Diansheng Guo,et al.  Coordinating Computational and Visual Approaches for Interactive Feature Selection and Multivariate Clustering , 2003, Inf. Vis..

[39]  Teuvo Kohonen,et al.  Self-Organizing Maps, Second Edition , 1997, Springer Series in Information Sciences.

[40]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[41]  Eser Kandogan,et al.  Visualizing multi-dimensional clusters, trends, and outliers using star coordinates , 2001, KDD '01.

[42]  Ronald A. Rensink,et al.  The Perception of Correlation in Scatterplots , 2010, Comput. Graph. Forum.