Visual Analytics of Co-Occurrences to Discover Subspaces in Structured Data

We present an approach that shows all relevant subspaces of categorical data condensed in a single picture. We model the categorical values of the attributes as co-occurrences with data partitions generated from structured data using pattern mining. We show that these co-occurrences are a-priori, allowing us to greatly reduce the search space, effectively generating the condensed picture where conventional approaches filter out several subspaces as these are deemed insignificant. The task of identifying interesting subspaces is common but difficult due to exponential search spaces and the curse of dimensionality. One application of such a task might be identifying a cohort of patients defined by attributes such as gender, age, and diabetes type that share a common patient history, which is modeled as event sequences. Filtering the data by these attributes is common but cumbersome and often does not allow a comparison of subspaces. We contribute a powerful multi-dimensional pattern exploration approach (MDPE-approach) agnostic to the structured data type that models multiple attributes and their characteristics as co-occurrences, allowing the user to identify and compare thousands of subspaces of interest in a single picture. In our MDPE-approach, we introduce two methods to dramatically reduce the search space, outputting only the boundaries of the search space in the form of two tables. We implement the MDPE-approach in an interactive visual interface (MDPE-vis) that provides a scalable, pixel-based visualization design allowing the identification, comparison, and sense-making of subspaces in structured data. Our case studies using a gold-standard dataset and external domain experts confirm our approach’s and implementation’s applicability. A third use case sheds light on the scalability of our approach and a user study with 15 participants underlines its usefulness and power.

[1]  Ronghua Liang,et al.  EvoSets: Tracking the Sensitivity of Dimensionality Reduction Results Across Subspaces , 2022, IEEE Transactions on Big Data.

[2]  Hanspeter Pfister,et al.  The Pattern is in the Details: An Evaluation of Interaction Techniques for Locating, Searching, and Contextualizing Details in Multivariate Matrix Visualizations , 2022, CHI.

[3]  Mayanka Chandrashekar,et al.  Class Representative Learning for Zero-shot Learning Using Purely Visual Data , 2021, SN Computer Science.

[4]  Biao Hou,et al.  Representative Learning via Span-Based Mutual Information for PolSAR Image Classification , 2021, Remote. Sens..

[5]  Sara Di Bartolomeo,et al.  Sequence Braiding: Visual Overviews of Temporal Event Sequences and Attributes , 2020, IEEE Transactions on Visualization and Computer Graphics.

[6]  Hanspeter Pfister,et al.  Commercial Visual Analytics Systems–Advances in the Big Data Analytics Field , 2019, IEEE Transactions on Visualization and Computer Graphics.

[7]  Claudio Gallicchio,et al.  Embeddings and Representation Learning for Structured Data , 2019, ESANN.

[8]  Wolfgang Jentner,et al.  Visualization and Visual Analytic Techniques for Patterns , 2019, Studies in Big Data.

[9]  Daniel A. Keim,et al.  SMARTexplore: Simplifying High-Dimensional Data Analysis through a Table-Based Visual Analytics Approach , 2018, 2018 IEEE Conference on Visual Analytics Science and Technology (VAST).

[10]  Philip S. Yu,et al.  A Survey of Parallel Sequential Pattern Mining , 2018, ACM Trans. Knowl. Discov. Data.

[11]  Daniel A. Keim,et al.  Making machine intelligence less scary for criminal analysts: reflections on designing a visual comparative case analysis tool , 2018, The Visual Computer.

[12]  Daniel A. Keim,et al.  Pattern Trails: Visual Analysis of Pattern Transitions in Subspaces , 2017, 2017 IEEE Conference on Visual Analytics Science and Technology (VAST).

[13]  Philippe Fournier-Viger,et al.  A survey of itemset mining , 2017, WIREs Data Mining Knowl. Discov..

[14]  Dirk J. Lehmann,et al.  Optimal Sets of Projections of High-Dimensional Data , 2016, IEEE Transactions on Visualization and Computer Graphics.

[15]  Jilles Vreeken,et al.  Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns , 2015, KDD.

[16]  David Gotz,et al.  DecisionFlow: Visual Analytics for High-Dimensional Temporal Event Sequence Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[17]  Hanspeter Pfister,et al.  UpSet: Visualization of Intersecting Sets , 2014, IEEE Transactions on Visualization and Computer Graphics.

[18]  David Gotz,et al.  Progressive Visual Analytics: User-Driven Visual Exploration of In-Progress Analytics , 2014, IEEE Transactions on Visualization and Computer Graphics.

[19]  Alexander Lex,et al.  Points of view: Sets and intersections , 2014, Nature Methods.

[20]  Jiawei Han,et al.  Mining Graph Patterns , 2014, Frequent Pattern Mining.

[21]  Manuel Campos,et al.  Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information , 2014, PAKDD.

[22]  Fei Wang,et al.  Frequence: interactive mining and visualization of temporal frequent event sequences , 2014, IUI.

[23]  Silvia Miksch,et al.  A matter of time: Applying a data-users-tasks design triangle to visual analytics of time-oriented data , 2014, Comput. Graph..

[24]  Ben Shneiderman,et al.  Temporal Event Sequence Simplification , 2013, IEEE Transactions on Visualization and Computer Graphics.

[25]  David Gotz,et al.  Exploring Flow, Factors, and Outcomes of Temporal Event Sequences with the Outflow Visualization , 2012, IEEE Transactions on Visualization and Computer Graphics.

[26]  Daniel A. Keim,et al.  Subspace search and visualization to make sense of alternative clusterings in high-dimensional data , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[27]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[28]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Fernando Berzal Galiano,et al.  Frequent tree pattern mining: A survey , 2010, Intell. Data Anal..

[30]  Matthew D. Cooper,et al.  ActiviTree: Interactive Visual Exploration of Sequences in Event-Based Data Using Graph Similarity , 2009, IEEE Transactions on Visualization and Computer Graphics.

[31]  Ben Shneiderman,et al.  Temporal Summaries: Supporting Temporal Categorical Searching, Aggregation and Comparison , 2009, IEEE Transactions on Visualization and Computer Graphics.

[32]  Changzhou Wang,et al.  DataJewel: Integrating Visualization with Temporal Data Mining , 2008, Visual Data Mining.

[33]  Panida Songram,et al.  Closed Multidimensional Sequential Pattern Mining , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[34]  Kenneth McGarry,et al.  A survey of interestingness measures for knowledge discovery , 2005, The Knowledge Engineering Review.

[35]  Jianhong Wu,et al.  Subspace clustering for high dimensional categorical data , 2004, SKDD.

[36]  Francesco Bonchi,et al.  On closed constrained frequent pattern mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[37]  Hans-Peter Kriegel,et al.  Subspace selection for clustering high-dimensional data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[38]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[39]  John D. Lee,et al.  Trust in Automation: Designing for Appropriate Reliance , 2004, Hum. Factors.

[40]  Cynthia A. Brewer,et al.  ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps , 2003 .

[41]  Howard J. Hamilton,et al.  Iceberg-cube algorithms: An empirical evaluation on synthetic and real data , 2003, Intell. Data Anal..

[42]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[43]  Umeshwar Dayal,et al.  Multi-dimensional sequential pattern mining , 2001, CIKM '01.

[44]  Laks V. S. Lakshmanan,et al.  On dual mining: from patterns to circumstances, and back , 2001, Proceedings 17th International Conference on Data Engineering.

[45]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[46]  Wynne Hsu,et al.  Analyzing the Subjective Interestingness of Association Rules , 2000, IEEE Intell. Syst..

[47]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[48]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[49]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[50]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[51]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[52]  Raymond Sokolov,et al.  Why We Eat What We Eat , 1993 .

[53]  A. Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[54]  Michael Stuart,et al.  Understanding Robust and Exploratory Data Analysis , 1984 .

[55]  Jarke J. van Wijk,et al.  Exploring Multivariate Event Sequences Using Rules, Aggregations, and Selections , 2018, IEEE Transactions on Visualization and Computer Graphics.

[56]  Daniel A. Keim,et al.  Minions, Sheep, and Fruits : Metaphorical Narratives to Explain Artificial Intelligence and Build Trust , 2018 .

[57]  Yun Sing Koh,et al.  A Survey of Sequential Pattern Mining , 2017 .

[58]  Gudrun Sproesser,et al.  The behavioural signature of snacking – a visual analysis , 2017 .

[59]  Harald Reiterer,et al.  Lightweight Visual Data Analysis on Mobile Devices - Providing Self-Monitoring Feedback , 2016, VVH@AVI.

[60]  Takeaki Uno,et al.  Frequent Pattern Mining , 2016, Encyclopedia of Algorithms.

[61]  Abhishek Raghuvanshi,et al.  A Survey of Sequential Rule Mining Techniques , 2013 .

[62]  Stephan Günnemann,et al.  Subspace clustering for complex data , 2012, BTW.

[63]  Mingzhu Zhang,et al.  Survey on Association Rules Mining Algorithms , 2010 .

[64]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[65]  R. Ramakrishnan,et al.  Bottom-Up Computation of Sparse and Iceberg CUBEs , 1999, SIGMOD Conference.