Motif Discovery Using Similarity-Constraints Deep Neural Networks

Discovering frequently occurring patterns (or motifs) in time series has many real-life applications in financial data, streaming media data, meteorological data, and sensor data. It is challenging to provide efficient motif discovery algorithms when the time series is big. Existing motif discovery algorithms trying to improve the performance can be classified into two categories: (i) reducing the computation cost but keeping the original time series dimensions; and (ii) applying feature representation models to reduce the dimensions. However, both of them have limitations when scaling to big time series. The performance of the first category algorithms heavily rely on the size of the dimension of the original time series, which performs bad when the time series is big. The second category algorithms cannot guarantee the original similarity properties, which means originally similar patterns may be identified as dissimilar. To address the limitations, we provide an efficient motif discovery algorithm, called FastM, which can reduce dimensions and maintain the similarity properties. FastM extends the deep neural network stacked AutoEncoder by introducing new central loss functions based on labels assigned by clustering algorithms. Comprehensive experimental results on three real-life datasets demonstrate both the high efficiency and accuracy of FastM.

[1]  Yifeng Gao,et al.  Efficient Discovery of Variable-length Time Series Motifs with Large Length Range in Million Scale Time Series , 2018, ArXiv.

[2]  Kuniaki Uehara,et al.  Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle , 2005, Machine Learning.

[3]  Yifeng Gao,et al.  Efficient discovery of time series motifs with large length range in million scale time series , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[4]  Abdullah Mueen Enumeration of Time Series Motifs of All Lengths , 2013, ICDM.

[5]  Tim Oates,et al.  Visualizing Variable-Length Time Series Motifs , 2012, SDM.

[6]  Eamonn J. Keogh,et al.  Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile , 2017, Data Mining and Knowledge Discovery.

[7]  Eamonn J. Keogh,et al.  Mining motifs in massive time series databases , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[8]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[9]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[10]  Vit Niennattrakul,et al.  Discovery of variable length time series motif , 2011, The 8th Electrical Engineering/ Electronics, Computer, Telecommunications and Information Technology (ECTI) Association of Thailand - Conference 2011.

[11]  Eamonn J. Keogh,et al.  Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[12]  Yuan Li,et al.  Finding approximate frequent patterns in streaming medical data , 2010, 2010 IEEE 23rd International Symposium on Computer-Based Medical Systems (CBMS).

[13]  Paulo J. Azevedo,et al.  Multiresolution Motif Discovery in Time Series , 2010, SDM.

[14]  Tim Oates,et al.  GrammarViz 2.0: A Tool for Grammar-Based Pattern Discovery in Time Series , 2014, ECML/PKDD.

[15]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[16]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[17]  Stephen Shaoyi Liao,et al.  Discovering original motifs with different lengths from time series , 2008, Knowl. Based Syst..

[18]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[19]  Jiaheng Lu,et al.  String similarity measures and joins with synonyms , 2013, SIGMOD '13.

[20]  Deng Cai,et al.  Stacked Similarity-Aware Autoencoders , 2017, IJCAI.

[21]  Xiaoyong Du,et al.  Fast and Scalable Distributed Set Similarity Joins for Big Data Analytics , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[22]  Man Lung Yiu,et al.  Quick-motif: An efficient and scalable framework for exact motif discovery , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[23]  Ankur Narang,et al.  Real-time approximate Range Motif discovery & data redundancy removal algorithm , 2011, EDBT/ICDT '11.

[24]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[25]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[26]  Toon Calders,et al.  Online Discovery of Top-k Similar Motifs in Time Series Data , 2011, SDM.