Time Series Motif Discovery and Anomaly Detection Based on Subseries Join

Time series are composed of sequences of data items measured at typically uniform intervals. Time series arise frequently in many scientic and engineering applications, including nance, medicine, digital audio, and motion capture. Time series motifs are repeated similar subseries in one or multiple time series data. Time series anoma- lies are unusual subseries in one or multiple time se- ries data. Finding motifs and anomalies in time series data are closely related problems and are useful in many domains, including medicine, motion capture, meteorology, and nance. This paper presents a novel approach for both the motif discovery problem and the anomaly detection problem. First, we use a subseries join operation to match similar subseries and to obtain similarity rela- tionships among subseries of the time series data. The subseries join algorithm we use can eciently and ef- fectively tolerate noise, time-scaling, and phase shifts. Based on the similarity relationships found among subseries of the time series data, the motif discovery and anomaly detection problems can be converted to graph-theoretic problems solvable by known graph- theoretic algorithms. Experiments demonstrate the eectiveness of the proposed approach to discover motifs and anomalies in real-world time series data. Experiments also demonstrate that the proposed ap- proach is ecient when applied to large time series datasets.

[1]  Bin Wu,et al.  A Parallel Algorithm for Enumerating All Maximal Cliques in Complex Network , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[2]  Yang-Sae Moon,et al.  General match: a subsequence matching method in time-series databases based on generalized windows , 2002, SIGMOD '02.

[3]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[4]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[5]  Shashi Shekhar,et al.  Spatial Databases: A Tour , 2003 .

[6]  Dimitrios Gunopulos,et al.  Indexing Large Human-Motion Databases , 2004, VLDB.

[7]  Ioannis P. Androulakis,et al.  Selecting maximally informative genes , 2005, Comput. Chem. Eng..

[8]  Eamonn J. Keogh,et al.  Detecting time series motifs under uniform scaling , 2007, KDD '07.

[9]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Yi Lin,et al.  Nonuniform Segment-Based Compression of Motion Capture Data , 2007, ISVC.

[11]  Chin-Tser Huang,et al.  Wavelet-based Real Time Detection of Network Traffic Anomalies , 2006, 2006 Securecomm and Workshops.

[12]  E. A. Akkoyunlu,et al.  The Enumeration of Maximal Cliques of Large Graphs , 1973, SIAM J. Comput..

[13]  Ali A. Ghorbani,et al.  Network Anomaly Detection Based on Wavelet Analysis , 2009, EURASIP J. Adv. Signal Process..

[14]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[15]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[16]  Kuniaki Uehara,et al.  Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle , 2005, Machine Learning.

[17]  J. Canny A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Marek Karpinski,et al.  A Fast Parallel Algorithm for Computing all Maximal Cliques in a Graph and the Related Problems (Extended Abstract) , 1988, SWAT.

[19]  Kazuo Iwama,et al.  Linear-Time Enumeration of Isolated Cliques , 2005, ESA.

[20]  Zeev Nutov,et al.  An almost O(log k)-approximation for k-connected subgraphs , 2009, SODA.

[21]  Li Wei,et al.  Efficiently finding unusual shapes in large image databases , 2008, Data Mining and Knowledge Discovery.

[22]  John H. Smith Preparation of Papers for the IAENG International Journal of Computer Science , 2009 .

[23]  Jesper Makholm Byskov Algorithms for k-colouring and finding maximal independent sets , 2003, SODA '03.

[24]  Irfan A. Essa,et al.  Discovering Characteristic Actions from On-Body Sensor Data , 2006, 2006 10th IEEE International Symposium on Wearable Computers.

[25]  D Marr,et al.  Theory of edge detection , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[26]  Irfan A. Essa,et al.  Unsupervised Activity Discovery and Characterization From Event-Streams , 2005, UAI.

[27]  Ali A. Ghorbani,et al.  Motif and Anomaly Discovery of Time Series Based on Subseries Join , 2010 .

[28]  Yi Lin,et al.  Subseries Join and Compression of Time Series Data Based on Non-uniform Segmentation , 2008 .