Discovering Key Sequences in Time Series Data for Pattern Classification

This paper addresses the issue of discovering key sequences from time series data for pattern classification. The aim is to find from a symbolic database all sequences that are both indicative and non-redundant. A sequence as such is called a key sequence in the paper. In order to solve this problem we first we establish criteria to evaluate sequences in terms of the measures of evaluation base and discriminating power. The main idea is to accept those sequences appearing frequently and possessing high co-occurrences with consequents as indicative ones. Then a sequence search algorithm is proposed to locate indicative sequences in the search space. Nodes encountered during the search procedure are handled appropriately to enable completeness of the search results while removing redundancy. We also show that the key sequences identified can later be utilized as strong evidences in probabilistic reasoning to determine to which class a new time series most probably belongs.

[1]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[2]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[3]  Peter Funk,et al.  A Case-Based Classification of Respiratory Sinus Arrhythmia , 2004, ECCBR.

[4]  Hannu Toivonen,et al.  Mining for similarities in aligned time series using wavelets , 1999, Defense, Security, and Sensing.

[5]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[6]  Fei Wu,et al.  Knowledge discovery in time-series databases , 2001 .

[7]  Asok Ray,et al.  Symbolic dynamic analysis of complex systems for anomaly detection , 2004, Signal Process..

[8]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[9]  Magnus Lie Hetland A Survey of Recent Methods for Efficient Retrieval of Similar Time Sequences , 2001 .

[10]  Akira Hayashi,et al.  Embedding Time Series Data for Classification , 2005, MLDM.

[11]  Abraham Kandel,et al.  Data Mining in Time Series Database , 2004 .

[12]  Wesley W. Chu,et al.  Efficient searches for similar subsequences of different lengths in sequence databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[13]  Divyakant Agrawal,et al.  A comparison of DFT and DWT based similarity search in time-series databases , 2000, CIKM '00.

[14]  Carolina Ruiz,et al.  Mining Expressive Temporal Associations from Complex Data , 2005, MLDM.

[15]  Petra Perner,et al.  Advances in Data Mining , 2002, Lecture Notes in Computer Science.

[16]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[17]  Barry Smyth,et al.  Advances in Case-Based Reasoning , 1996, Lecture Notes in Computer Science.

[18]  Anthony K. H. Tung,et al.  Breaking the barrier of transactions: mining inter-transaction association rules , 1999, KDD '99.

[19]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[20]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[21]  Cyrus Shahabi,et al.  Feature subset selection and feature ranking for multivariate time series , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  An-Pin Chen,et al.  An Association Mining Method for Time Series and Its Application in the Stock Prices of TFT-LCD Industry , 2004, ICDM.