SSRDVis: Interactive visualization for event sequences summarization and rare detection

Abstract This paper presents SSRDVis, a visual approach to effectively summarize event sequences and interactively detect rare behaviors. SSRDVis is mainly composed of three components: (1) a sequence embedding module for learning effective feature vectors of sequences, (2) a sequence grouping and summarization module to find representative clusters and patterns in the dataset, (3) a rare detection module to discover and explain the rare cases. The sequences are embedded into vector space via “mixed-ngram2vec,” which is adapted from “word2vec.” Then, unsupervised learning models could be applied to group similar sequences and detect anomalies in the vector space. Furthermore, sequential pattern graphs are built to provide a compact and semantic summarization of sequences. These components work together to present both overall sequential patterns and abnormal behaviors in one visual interface. We have demonstrated the feasibility of our approach by applying it to analyze Web clickstreams. Experimental results have shown that our approach could help identify noticeable patterns from a large number of event sequences, especially for rare behaviors. Graphic abstract

[1]  Ben Shneiderman,et al.  LifeFlow: visualizing an overview of event sequences , 2011, CHI.

[2]  Vincent S. Tseng,et al.  Mining Top-K Association Rules , 2012, Canadian Conference on AI.

[3]  Leonidas J. Guibas,et al.  Deep Knowledge Tracing , 2015, NIPS.

[4]  Hongyuan Zha,et al.  Visualizing Uncertainty and Alternatives in Event Sequence Predictions , 2019, CHI.

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Ben Shneiderman,et al.  Coping with Volume and Variety in Temporal Event Sequences: Strategies for Sharpening Analytic Focus , 2017, IEEE Transactions on Visualization and Computer Graphics.

[8]  Manuel Campos,et al.  Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information , 2014, PAKDD.

[9]  Jing Lu,et al.  Sequential patterns graph and its construction algorithm , 2004 .

[10]  Mitsuhiko Toda,et al.  Methods for Visual Understanding of Hierarchical System Structures , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[12]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[13]  Mira Dontcheva,et al.  CoreFlow: Extracting and Visualizing Branching Patterns from Event Sequences , 2017, Comput. Graph. Forum.

[14]  David Gotz,et al.  Exploring Flow, Factors, and Outcomes of Temporal Event Sequences with the Outflow Visualization , 2012, IEEE Transactions on Visualization and Computer Graphics.

[15]  Xiaoyong Du,et al.  Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics , 2017, EMNLP.

[16]  Hongan Wang,et al.  Mining User-Aware Rare Sequential Topic Patterns in Document Streams , 2016, IEEE Transactions on Knowledge and Data Engineering.

[17]  Mike Sips,et al.  Understanding a Sequence of Sequences: Visual Exploration of Categorical States in Lake Sediment Cores , 2018, IEEE Transactions on Visualization and Computer Graphics.

[18]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[19]  Xiaoju Dong,et al.  A real-time network security visualization system based on incremental learning (ChinaVis 2018) , 2018, Journal of Visualization.

[20]  Ingo Scholtes,et al.  When is a Network a Network?: Multi-Order Graphical Model Selection in Pathways and Temporal Networks , 2017, KDD.

[21]  Yuanzhe Chen,et al.  Sequence Synopsis: Optimize Visual Summary of Temporal Event Data , 2018, IEEE Transactions on Visualization and Computer Graphics.

[22]  Yun Sing Koh,et al.  Unsupervised Rare Pattern Mining , 2016, ACM Trans. Knowl. Discov. Data.

[23]  Yang Wang,et al.  Patterns and Sequences: Interactive Exploration of Clickstreams to Understand Common Visitor Paths , 2017, IEEE Transactions on Visualization and Computer Graphics.

[24]  Jimeng Sun,et al.  RetainVis: Visual Analytics with Interpretable and Interactive Recurrent Neural Networks on Electronic Medical Records , 2018, IEEE Transactions on Visualization and Computer Graphics.

[25]  Kwan-Liu Ma,et al.  Visual cluster exploration of web clickstream data , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[26]  Ke Xu,et al.  EventThread: Visual Summarization and Stage Analysis of Event Sequence Data , 2018, IEEE Transactions on Visualization and Computer Graphics.

[27]  Gemma Casas-Garriga,et al.  Summarizing Sequential Data with Closed Partial Orders. , 2005 .

[28]  Pascal Poncelet,et al.  MultiStream: A Multiresolution Streamgraph Approach to Explore Hierarchical Time Series , 2018, IEEE Transactions on Visualization and Computer Graphics.

[29]  Ben Shneiderman,et al.  Temporal Event Sequence Simplification , 2013, IEEE Transactions on Visualization and Computer Graphics.

[30]  Benjamin Négrevergne,et al.  Mining Rare Sequential Patterns with ASP , 2017, ILP.