Guided discovery of interesting relationships between time series clusters and metadata properties

Visual cluster analysis provides valuable tools that help analysts to understand large data sets in terms of representative clusters and relationships thereof. Often, the found clusters are to be understood in context of belonging categorical, numerical or textual metadata which are given for the data elements. While often not part of the clustering process, such metadata play an important role and need to be considered during the interactive cluster exploration process. Traditionally, linked-views allow to relate (or loosely speaking: correlate) clusters with metadata or other properties of the underlying cluster data. Manually inspecting the distribution of metadata for each cluster in a linked-view approach is tedious, especially for large data sets, where a large search problem arises. Fully interactive search for potentially useful or interesting cluster to metadata relationships may constitute a cumbersome and long process. To remedy this problem, we propose a novel approach for guiding users in discovering interesting relationships between clusters and associated metadata. Its goal is to guide the analyst through the potentially huge search space. We focus in our work on metadata of categorical type, which can be summarized for a cluster in form of a histogram. We start from a given visual cluster representation, and compute certain measures of interestingness defined on the distribution of metadata categories for the clusters. These measures are used to automatically score and rank the clusters for potential interestingness regarding the distribution of categorical metadata. Identified interesting relationships are highlighted in the visual cluster representation for easy inspection by the user. We present a system implementing an encompassing, yet extensible, set of interestingness scores for categorical metadata, which can also be extended to numerical metadata. Appropriate visual representations are provided for showing the visual correlations, as well as the calculated ranking scores. Focusing on clusters of time series data, we test our approach on a large real-world data set of time-oriented scientific research data, demonstrating how specific interesting views are automatically identified, supporting the analyst discovering interesting and visually understandable relationships.

[1]  E. H. Simpson Measurement of Diversity , 1949, Nature.

[2]  B. McArthur,et al.  Baseline surface radiation network (BSRN/WCRP) New precision radiometry for climate research , 1998 .

[3]  John P. Lewis,et al.  Eurographics/ Ieee-vgtc Symposium on Visualization 2009 Selecting Good Views of High-dimensional Data Using Class Consistency , 2022 .

[4]  Jimeng Sun,et al.  DICON: Interactive Visual Analysis of Multidimensional Clusters , 2011, IEEE Transactions on Visualization and Computer Graphics.

[5]  Jin Chen,et al.  A Visualization System for Space-Time and Multivariate Patterns (VIS-STAMP) , 2006, IEEE Transactions on Visualization and Computer Graphics.

[6]  Tobias Schreck,et al.  Retrieval and exploratory search in multivariate research data repositories using regressional features , 2011, JCDL '11.

[7]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[8]  Daniel A. Keim,et al.  Pixnostics: Towards Measuring the Value of Visualization , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[9]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[10]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[11]  Tobias Schreck,et al.  Reference list of 265 sources used for the discovery of relationships between data clusters and metadata properties , 2012 .

[12]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[13]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[14]  Matthew O. Ward,et al.  Model space visualization for multivariate linear trend discovery , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[15]  T. Kohonen,et al.  Visual Explorations in Finance with Self-Organizing Maps , 1998 .

[16]  Robert L. Grossman,et al.  Graph-Theoretic Scagnostics , 2005, INFOVIS.

[17]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[18]  Matthew O. Ward,et al.  Hierarchical parallel coordinates for exploration of large datasets , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[19]  Tobias Schreck,et al.  Content-based layouts for exploratory metadata search in scientific research data , 2012, JCDL '12.

[20]  David Carmel,et al.  Enhancing cluster labeling using wikipedia , 2009, SIGIR.

[21]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[22]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[23]  Teuvo Kohonen,et al.  Visual Explorations in Finance , 1998 .

[24]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[25]  Marcus A. Magnor,et al.  Automated Analytical Methods to Support Visual Exploration of High-Dimensional Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[26]  Wolfgang Berger,et al.  Eurographics/ Ieee-vgtc Symposium on Visualization 2010 Hypermoval: Interactive Visual Validation of Regression Models for Real-time Simulation , 2022 .