Towards Glyph-based visualizations for big data clustering

Data Analysts have to deal with an ever-growing amount of data resources. One way to make sense of this data is to extract features and use clustering algorithms to group items according to a similarity measure. Algorithm developers are challenged when evaluating the performance of the algorithm since it is hard to identify features that influence the clustering. Moreover, many algorithms can be trained using a semi-supervised approach, where human users provide ground truth samples by manually grouping single items. Hence, visualization techniques are needed that help data analysts achieve their goal in evaluating Big data clustering algorithms. In this context, Multidimensional Scaling (MDS) has become a prominent visualization tool. In this paper, we propose a combination with glyphs that can provide a detailed view of specific features involved in MDS. In consequence, human users can understand, adjust, and ultimately improve clustering algorithms. We present a thorough glyph design, which is founded in a comprehensive survey of related work and report the results of a controlled experiments, where participants solved data analysis tasks with both glyphs and a traditional textual display of data values.

[1]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[2]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[3]  Herman Chernoff,et al.  The Use of Faces to Represent Points in k- Dimensional Space Graphically , 1973 .

[4]  Johannes Fuchs,et al.  Monitoring large IP spaces with ClockView , 2011, VizSec '11.

[5]  Haim Levkowitz,et al.  From Visual Data Exploration to Visual Data Mining: A Survey , 2003, IEEE Trans. Vis. Comput. Graph..

[6]  LalanneDenis,et al.  Investigating and reflecting on the integration of automatic data analysis and visualization in knowledge discovery , 2010 .

[7]  Matthew O. Ward,et al.  Multivariate Data Glyphs: Principles and Practice , 2008 .

[8]  Denis Lalanne,et al.  Surveying the complementary role of automatic data analysis and visualization in knowledge discovery , 2009, VAKD '09.

[9]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[10]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[11]  Min Chen,et al.  Glyph-based Visualization: Foundations, Design Guidelines, Techniques and Applications , 2013, Eurographics.

[12]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[13]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[14]  Daniel A. Keim,et al.  Designing Pixel-Oriented Visualization Techniques: Theory and Applications , 2000, IEEE Trans. Vis. Comput. Graph..

[15]  Mandy Keck,et al.  TagStar: a glyph-based interface for indexing and visual analysis , 2014, AVI.

[16]  Petra Isenberg,et al.  Evaluation of alternative glyph designs for time series data in a small multiple setting , 2013, CHI.

[17]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[18]  Michael Chau,et al.  Visualizing web search results using glyphs: Design and evaluation of a flower metaphor , 2011, TMIS.

[19]  Min Chen,et al.  How Ordered Is It? On the Perceptual Orderability of Visual Channels , 2016, Comput. Graph. Forum.

[20]  Andy Kirk,et al.  Data Visualisation: A Handbook for Data Driven Design , 2016 .

[21]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[22]  Jock D. Mackinlay,et al.  Automating the design of graphical presentations of relational information , 1986, TOGS.

[23]  Timo Ropinski,et al.  Survey of glyph-based visualization techniques for spatial multivariate medical data , 2011, Comput. Graph..

[24]  Martin Schrepp,et al.  Construction and Evaluation of a User Experience Questionnaire , 2008, USAB.

[25]  Vladimir Estivill-Castro,et al.  Why so many clustering algorithms: a position paper , 2002, SKDD.

[26]  Denis Lalanne,et al.  Investigating and reflecting on the integration of automatic data analysis and visualization in knowledge discovery , 2010, SKDD.

[27]  Judith S. Donath,et al.  PeopleGarden: creating data portraits for users , 1999, UIST '99.

[28]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .