Selection of Relevant and Non-Redundant Multivariate Ordinal Patterns for Time Series Classification

Transformation of multivariate time series into feature spaces are common for data mining tasks like classification. Ordinality is one important property in time series that provides a qualitative representation of the underlying dynamic regime. In a multivariate time series, ordinalities from multiple dimensions combine together to be discriminative for the classification problem. However, existing works on ordinality do not address the multivariate nature of the time series. For multivariate ordinal patterns, there is a computational challenge with an explosion of pattern combinations, while not all patterns are relevant and provide novel information for the classification. In this work, we propose a technique for the extraction and selection of relevant and non-redundant multivariate ordinal patterns from the high-dimensional combinatorial search space. Our proposed approach Ordinal feature extraction (ordex), simultaneously extracts and scores the relevance and redundancy of ordinal patterns without training a classifier. As a filter-based approach, ordex aims to select a set of relevant patterns with complementary information. Hence, using our scoring function based on the principles of Chebyshev’s inequality, we maximize the relevance of the patterns and minimize the correlation between them. Our experiments on real world datasets show that ordinality in time series contains valuable information for classification in several applications.

[1]  Rohit J. Kate Using dynamic time warping distances as features for improved time series classification , 2016, Data Mining and Knowledge Discovery.

[2]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[3]  Arvind Kumar Shekar,et al.  Including Multi-feature Interactions and Redundancy for Feature Ranking in Mixed Datasets , 2017, ECML/PKDD.

[4]  Danuta Makowiec,et al.  Ordinal pattern statistics for the assessment of heart rate variability , 2013 .

[5]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[6]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[7]  Ali Ghodsi,et al.  Detecting Change-Points in Time Series by Maximum Mean Discrepancy of Ordinal Pattern Distributions , 2012, UAI.

[8]  W. J. Studden,et al.  Tchebycheff Systems: With Applications in Analysis and Statistics. , 1967 .

[9]  Yannis Manolopoulos,et al.  Feature-based classification of time-series data , 2001 .

[10]  Nick S. Jones,et al.  Highly Comparative Feature-Based Time-Series Classification , 2014, IEEE Transactions on Knowledge and Data Engineering.

[11]  Shenling Wang,et al.  Time Series Classification with Max-Correlation and Min-Redundancy Shapelets Transformation , 2015, 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things (IIKI).

[12]  Liang Wang,et al.  Structure-Based Statistical Features and Multivariate Time Series Clustering , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[13]  Eamonn J. Keogh,et al.  Finding Motifs in a Database of Shapes , 2007, SDM.

[14]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[15]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  B. Pompe,et al.  Permutation entropy: a natural complexity measure for time series. , 2002, Physical review letters.

[18]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[19]  R. Coifman,et al.  Local feature extraction and its applications using a library of bases , 1994 .