Cluster Sequence Mining: Causal Inference with Time and Space Proximity Under Uncertainty

We propose a pattern mining algorithm for numerical multidimensional event sequences, called cluster sequence mining (CSM). CSM extracts patterns with a pair of clusters that satisfies space proximity of the individual clusters and time proximity in time intervals between events from different clusters. CSM is an extension of a unique algorithm (co-occurrence cluster mining (CCM)), considering the order of events and the distribution of time intervals. The probability density of the time intervals is inferred by utilizing Bayesian inference for robustness against uncertainty. In an experiment using synthetic data, we confirmed that CSM is capable of extracting clusters with high F-measure and low estimation error of the time interval distribution even under uncertainty. CSM was applied to an earthquake event sequence in Japan after the 2011 Tohoku Earthquake to infer causality of earthquake occurrences. The results demonstrate that CSM suggests some high affecting/affected areas in the subduction zone farther away from the main shock of the Tohoku Earthquake.