Motif-Based Classification of Time Series with Bayesian Networks and SVMs

Classification of time series is an important task with many challenging applications like brain wave (EEG) analysis, signature verification or speech recognition. In this paper we show how characteristic local patterns (motifs) can improve the classification accuracy. We introduce a new motif class, generalized semi-continuous motifs. To allow flexibility and noise robustness, these motifs may include gaps of various lengths, generic and more specific wildcards. We propose an efficient algorithm for mining generalized sequential motifs. In experiments on real medical data, we show how generalized semi-continuous motifs improve the accuracy of SVMs and Bayesian Networks for time series classification.

[1]  Saso Dzeroski,et al.  Analysis of Time Series Data with Predictive Clustering Trees , 2006, KDID.

[2]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[3]  Ferenc Bodon,et al.  A trie-based APRIORI implementation for mining frequent item sequences , 2005 .

[4]  Mark P. Styczynski,et al.  A generic motif discovery algorithm for sequential data. , 2006, Bioinformatics.

[5]  Eamonn J. Keogh,et al.  Mining motifs in massive time series databases , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6]  Christian Borgelt Recursion Pruning for the Apriori Algorithm , 2004, FIMI.

[7]  R. Manmatha,et al.  Word image matching using dynamic time warping , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Eamonn J. Keogh,et al.  Everything you know about Dynamic Time Warping is Wrong , 2004 .

[9]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[10]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[11]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[12]  Bernhard Sick,et al.  Signature Verification with Dynamic RBF Networks and Time Series Motifs , 2006 .

[13]  Paulo J. Azevedo,et al.  Mining Approximate Motifs in Time Series , 2006, Discovery Science.

[14]  Eytan Ruppin,et al.  Motif extraction and protein classification , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[15]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[16]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Matthias E. Futschik,et al.  Noise-robust Soft Clustering of Gene Expression Time-course Data , 2005, J. Bioinform. Comput. Biol..

[18]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[19]  Eamonn J. Keogh,et al.  Detecting time series motifs under uniform scaling , 2007, KDD '07.

[20]  Wolfgang Gaul,et al.  Mining generalized association rules for sequential and path data , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[21]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[22]  Rüdiger Wirth,et al.  A New Algorithm for Faster Mining of Generalized Association Rules , 1998, PKDD.

[23]  Paulo J. Azevedo,et al.  Protein Sequence Classification Through Relevant Sequence Mining and Bayes Classifiers , 2005, EPIA.

[24]  R. Manmatha,et al.  Indexing of Handwritten Historical Documents - Recent Progress , 2003 .

[25]  José del R. Millán,et al.  Person Authentication Using Brainwaves (EEG) and Maximum A Posteriori Model Adaptation , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[27]  Masaru Kitsuregawa,et al.  FP-tax: tree structure based generalized association rule mining , 2004, DMKD '04.

[28]  L. Schmidt-Thieme,et al.  Identifying Patients at Risk , 2009 .

[29]  Thanaruk Theeramunkong,et al.  A new method for finding generalized frequent itemsets in generalized association rule mining , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[30]  Thanaruk Theeramunkong,et al.  Fast Algorithms for Mining Generalized Frequent Patterns of Generalized Association Rules , 2004, IEICE Trans. Inf. Syst..