RPM: Representative Pattern Mining for Efficient Time Series Classification

Time series classication is an important problem that has received a great amount of attention by researchers and practitioners in the past two decades. In this work, we propose a novel algorithm for time series classication based on the discovery of class-specic representative patterns. We dene representative patterns of a class as a set of subsequences that has the greatest discriminative power to distinguish one class of time series from another. Our approach rests upon two techniques with linear complexity: symbolic discretization of time series, which generalizes the structural patterns, and grammatical inference, which automatically nds recurrent correlated patterns of variable length, producing an initial pool of common patterns shared by many instances in a class. From this pool of candidate patterns, our algorithm selects the most representative patterns that capture the class specicities, and that can be used to eectively discriminate between time series classes. Through an exhaustive experimental evaluation we show that our algorithm is competitive in accuracy and speed with the stateof-the-art classication techniques on the UCR time series repository, robust on shifted data, and demonstrates excellent performance on real-world noisy medical time series.

[1]  Romain Briandet,et al.  Discrimination of Arabica and Robusta in Instant Coffee by Fourier Transform Infrared Spectroscopy and Chemometrics , 1996 .

[2]  Jason Lines,et al.  Classification of time series by shapelet transformation , 2013, Data Mining and Knowledge Discovery.

[3]  Yannis Manolopoulos,et al.  Feature-based classification of time-series data , 2001 .

[4]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[5]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[6]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[7]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[8]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[9]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[10]  Lars Schmidt-Thieme,et al.  Learning time-series shapelets , 2014, KDD.

[11]  黄亚明 PhysioBank , 2009 .

[12]  Philip S. Yu,et al.  Extracting Interpretable Features for Early Classification on Time Series , 2011, SDM.

[13]  Pierre Geurts,et al.  Pattern Extraction for Time Series Classification , 2001, PKDD.

[14]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[15]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[16]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[17]  R. Coifman,et al.  Local feature extraction and its applications using a library of bases , 1994 .

[18]  C. D. Perttunen,et al.  Lipschitzian optimization without the Lipschitz constant , 1993 .

[19]  Li Wei,et al.  Semi-supervised time series classification , 2006, KDD '06.

[20]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[21]  Tim Oates,et al.  GrammarViz 2.0: A Tool for Grammar-Based Pattern Discovery in Time Series , 2014, ECML/PKDD.

[22]  Tim Oates,et al.  Visualizing Variable-Length Time Series Motifs , 2012, SDM.

[23]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[24]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[25]  Alexandros Nanopoulos,et al.  Time-Series Classification in Many Intrinsic Dimensions , 2010, SDM.

[26]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[27]  Eamonn J. Keogh,et al.  Finding Motifs in a Database of Shapes , 2007, SDM.

[28]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[29]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[30]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[31]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[32]  Thomas Seidl,et al.  Effective and Robust Mining of Temporal Subspace Clusters , 2012, 2012 IEEE 12th International Conference on Data Mining.

[33]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[34]  Sergey Malinchik,et al.  SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model , 2013, 2013 IEEE 13th International Conference on Data Mining.