Efficient and Distributed Temporal Pattern Mining

The widespread deployment of IoT systems in the real world today has enabled the generation and collection of an enormous amount of sensor times series. One of the important mining techniques to extract patterns from time series is temporal pattern mining (TPM). Unlike the sequential pattern mining, TPM adds an additional temporal dimension, i.e., time intervals, into extracted patterns, making them more informative. However, adding the extra temporal dimension into patterns results in an additional exponential factor to the growth of the search space, and thus, significantly increases the mining complexity. Current TPM approaches work sequentially, therefore, cannot scale to large datasets. In this paper, we propose Distributed Hierarchical Pattern Graph TPM (DHPG-TPM), the first distributed solution that supports large-scale TPM using the leading distributed platform Apache Spark. Moreover, DHPG-TPM employs efficient data structures, distributed bitmap and distributed Hierarchical Pattern Graph that are carefully designed to work efficiently in a distributed environment to enable fast computations of support and confidence. To address the exponential search space of TPM, we design effective distributed pruning techniques based on the Apriori principle and the transitivity property of temporal relations to reduce the search space while minimizing the communication overhead between the cluster nodes. We conduct extensive experiments on real-world and synthetic datasets, showing that DHPG-TPM outperforms the sequential baselines and scales to very large datasets.