Cluster Sculptor, an interactive visual clustering system

Abstract This paper describes Cluster Sculptor, a novel interactive clustering system that allows a user to iteratively update the cluster labels of a data set, and an associated low-dimensional projection. The system is fed by clustering results computed in a high-dimensional space, and uses a two-dimensional (2D) projection, both as support for overlaying the cluster labels, and engaging user interaction. By easily interacting with elements directly in the visualization, the user can inject his or her domain knowledge progressively. Via interactive controls, the distribution of the data in the 2D space can be used to amend the cluster labels. Reciprocally, the 2D projection can be updated so as to emphasize the current clusters. The 2D projection updates follow a smooth physical metaphor that gives insight of the process to the user. Updates can be interrupted any time, for further data inspection, or modifying the input preferences. The interest of the system is demonstrated by detailed experimental scenarios on three real data sets.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[3]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[4]  Jean-Daniel Fekete,et al.  ProxiViz : an Interactive Visualization Technique to Overcome Multidimensional Scaling Artifacts , 2012 .

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[7]  J. Douglas Carroll,et al.  Two-Way Multidimensional Scaling: A Review , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[9]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[10]  R. Prim Shortest connection networks and some generalizations , 1957 .

[11]  Tobias Schreck,et al.  Visual Cluster Analysis of Trajectory Data with Interactive Kohonen Maps , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[12]  Guillaume Cleuziou,et al.  Interactive and Progressive Constraint Definition for Dimensionality Reduction and Visualization , 2010, EGC.

[13]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[14]  Michaël Aupetit,et al.  Visualizing distortions and recovering topology in continuous projection techniques , 2007, Neurocomputing.

[15]  Ben Shneiderman,et al.  Interactively Exploring Hierarchical Clustering Results , 2003 .

[16]  Alexander Sczyrba,et al.  Nonlinear Dimensionality Reduction for Cluster Identification in Metagenomic Samples , 2013, 2013 17th International Conference on Information Visualisation.

[17]  I. Jolliffe Principal Component Analysis , 2002 .

[18]  John T. Stasko,et al.  iVisClustering: An Interactive Visual Document Clustering via Topic Modeling , 2012, Comput. Graph. Forum.

[19]  Lambert M. Surhone,et al.  Node.js , 2010 .

[20]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[21]  Pierrick Bruneau,et al.  Interactive unsupervised classification and visualization for browsing an image collection , 2010, Pattern Recognit..

[22]  L. Hubert,et al.  Comparing partitions , 1985 .

[23]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[24]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[26]  D. Shepard A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.

[27]  Jeanny Hérault,et al.  Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets , 1997, IEEE Trans. Neural Networks.

[28]  Steve Vinoski,et al.  Node.js: Using JavaScript to Build High-Performance Network Programs , 2010, IEEE Internet Comput..

[29]  Dino Pedreschi,et al.  Visually driven analysis of movement data by progressive clustering , 2008, Inf. Vis..

[30]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[31]  Helwig Hauser,et al.  Integrating cluster formation and cluster evaluation in interactive visual analysis , 2011, SCC.

[32]  Dino Pedreschi,et al.  Interactive visual clustering of large collections of trajectories , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[33]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[34]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[35]  Carla E. Brodley,et al.  Dis-function: Learning distance functions interactively , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[36]  DemartinesP.,et al.  Curvilinear component analysis , 1997 .

[37]  Philippe Joly,et al.  Dynamic organization of audiovisual database using a user-defined similarity measure based on low-level features , 2008, 2008 15th IEEE International Conference on Image Processing.

[38]  Luis Gustavo Nonato,et al.  User‐driven Feature Space Transformation , 2013, Comput. Graph. Forum.

[39]  Desney S. Tan,et al.  Interactive optimization for steering machine classification , 2010, CHI.

[40]  Alexandru Telea,et al.  Visual Analysis of Multi‐Dimensional Categorical Data Sets , 2013, Comput. Graph. Forum.

[41]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Cynthia A. Brewer,et al.  ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps , 2003 .

[43]  Michaël Aupetit,et al.  ClassiMap: a Supervised Mapping Technique for Decision Support , 2013, VAMP@EuroVis.

[44]  Cláudio T. Silva,et al.  Two-Phase Mapping for Projecting Massive Data Sets , 2010, IEEE Transactions on Visualization and Computer Graphics.

[45]  Jean-Daniel Fekete,et al.  ProxiLens: Interactive Exploration of High-Dimensional Data using Projections , 2013, VAMP@EuroVis.

[46]  Jeffrey Heer,et al.  Animated Transitions in Statistical Data Graphics , 2007, IEEE Transactions on Visualization and Computer Graphics.

[47]  Manojit Sarkar,et al.  Graphical fisheye views of graphs , 1992, CHI.

[48]  Samuel Kaski,et al.  Scalable Optimization of Neighbor Embedding for Visualization , 2013, ICML.

[49]  P. Bruneau,et al.  A Proposition of Interactive Visual Clustering System , 2013, VAMP@EuroVis.

[50]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .