Finding K Most Significant Motifs in Big Time Series Data

Abstract An efficient discovery algorithm of frequently occurring patterns, called motifs, in a time series would be useful as a tool for summarizing and visualizing big time series databases. In this paper, we propose an efficient approximate algorithm, called DiscMotifs, to discover the K most significant (KMS) motifs from time series. First, the proposed algorithm transforms the time series into a SAX representation and then the algorithm divides the SAX representation into subsequences. Next, these subsequences are linearized by projecting them into a one-dimensional space based on their distances form a randomly selected reference point, or a subsequence. By utilizing the linear ordering of subsequences, DiscMotifs efficiently discovers the KMS motifs. DiscMotifs algorithm requires a storage space linear to the number of subsequences. We demonstrate the feasibility of this approach on several synthetic and real application datasets.