Cluster sequence mining from event sequence data and its application to damage correlation analysis

Abstract We propose a novel mining algorithm called cluster sequence mining (CSM) to extract cluster pairs with occurrence correlation from event sequence data. CSM extracts patterns with a pair of clusters that satisfies space proximity of the individual clusters and temporal proximity between events from different clusters in time intervals. CSM extends a unique co-occurring cluster mining (CCM) algorithm by considering the order of event occurrences and distribution of time intervals. The probability density of time intervals is inferred using Bayesian inference for robustness against uncertainty. To improve inference accuracy of the density function of time intervals, we utilize the idea of dynamic programming (DP) matching to obtain the correspondence between multiple event occurrences. With an experiment using synthetic data, we confirm that CSM is capable of extracting clusters with a high F-measure and low estimation error of the time interval distribution even under uncertainty. In addition, we find that DP matching can improve the inference accuracy of the density function of time intervals. Finally, CSM is applied to a real-world acoustic emission event sequence data set to evaluate damage interactions in a fuel cell.

[1]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[2]  Shinichi Nakasuka,et al.  Anomaly Detection Method for Spacecrafts Based on Association Rule Mining , 2001 .

[3]  Rie Honda,et al.  Extraction of Highly Correlated Temporal Event Cluster Recurrence from Spatiotemporal Data , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[4]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[5]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[6]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[7]  Takashi Washio,et al.  Mining quantitative frequent itemsets using adaptive density-based subspace clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  Masayuki Numao,et al.  Discovering Seismic Interactions after the 2011 Tohoku Earthquake by Co-occurring Cluster Mining , 2014 .

[9]  Rie Honda,et al.  Temporal Rule Discovery for Time-Series Satellite Images and Integration with RDB , 2001, PKDD.

[10]  Masayuki Numao,et al.  Visualization of Damage Progress in Solid Oxide Fuel Cells , 2011 .

[11]  Fabrice Rossi,et al.  Batch kernel SOM and related Laplacian methods for social network analysis , 2008, Neurocomputing.

[12]  Brian Everitt,et al.  Cluster analysis , 1974 .

[13]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[14]  Masayuki Numao,et al.  Co-occurring Cluster Mining for Damage Patterns Analysis of a Fuel Cell , 2012, PAKDD.

[15]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[16]  Masayuki Numao,et al.  Cluster Sequence Mining: Causal Inference with Time and Space Proximity Under Uncertainty , 2015, PAKDD.

[17]  Martin Kulldorff,et al.  Prospective time periodic geographical disease surveillance using a scan statistic , 2001 .