Classification of Time Sequences using Graphs of Temporal Constraints

We introduce two algorithms that learn to classify Symbolic and Scalar Time Sequences (SSTS); an extension of multivariate time series. An SSTS is a set of events and a set of scalars. An event is defined by a symbol and a time-stamp. A scalar is defined by a symbol and a function mapping a number for each possible time stamp of the data. The proposed algorithms rely on temporal patterns called Graph of Temporal Constraints (GTC). A GTC is a directed graph in which vertices express occurrences of specific events, and edges express temporal constraints between occurrences of pairs of events. Additionally, each vertex of a GTC can be augmented with numeric constraints on scalar values. We allow GTCs to be cyclic and/or disconnected. The first of the introduced algorithms extracts sets of co-dependent GTCs to be used in a voting mechanism. The second algorithm builds decision forest like representations where each node is a GTC. In both algorithms, extraction of GTCs and model building are interleaved. Both algorithms are closely related to each other and they exhibit complementary properties including complexity, performance, and interpretability. The main novelties of this work reside in direct building of the model and efficient learning of GTC structures. We explain the proposed algorithms and evaluate their performance against a diverse collection of 59 benchmark data sets. In these experiments, our algorithms come across as highly competitive and in most cases closely match or outperform state-of-the-art alternatives in terms of the computational speed while dominating in terms of the accuracy of classification of time sequences.

[1]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[2]  George C. Runger,et al.  A time series forest for classification and feature extraction , 2013, Inf. Sci..

[3]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[5]  Jignesh M. Patel,et al.  An efficient and accurate method for evaluating time series similarity , 2007, SIGMOD '07.

[6]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[7]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[8]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[9]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[12]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[13]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[14]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[15]  Milos Hauskrecht,et al.  Mining recent temporal patterns for event detection in multivariate time series data , 2012, KDD.

[16]  Juan José Rodríguez Diez,et al.  Interval and dynamic time warping-based decision trees , 2004, SAC '04.

[17]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[18]  Hans-Peter Kriegel,et al.  Similarity Search on Time Series Based on Threshold Queries , 2006, EDBT.

[19]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[20]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[21]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[22]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[23]  Sebastian Nowozin,et al.  Weighted Substructure Mining for Image Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[25]  Geoffrey I. Webb,et al.  Advances in Knowledge Discovery and Data Mining , 2018, Lecture Notes in Computer Science.

[26]  Stephen Muggleton,et al.  Inductive Logic Programming , 2011, Lecture Notes in Computer Science.

[27]  Henrik Boström,et al.  Learning First Order Logic Time Series Classifiers: Rules and Boosting , 2000, PKDD.

[28]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[29]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[30]  Siegfried Nijssen,et al.  Pattern-Based Classification: A Unifying Perspective , 2011, ArXiv.

[31]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[32]  Mathieu Guillame-Bert,et al.  Learning Temporal Association Rules on Symbolic Time Sequences , 2012, ACML.

[33]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[34]  Yannis Theodoridis,et al.  Index-based Most Similar Trajectory Search , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[35]  Christophe Dousson,et al.  Discovering Chronicles with Numerical Time Constraints from Alarm Logs for Monitoring Dynamic Systems , 1999, IJCAI.

[36]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.