Time Series classification through transformation and ensembles

The problem of time series classification (TSC), where we consider any real-valued ordered data a time series, offers a specific challenge. Unlike traditional classification problems, the ordering of attributes is often crucial for identifying discriminatory features between classes. TSC problems arise across a diverse range of domains, and this variety has meant that no single approach outperforms all others. The general consensus is that the benchmark for TSC is nearest neighbour (NN) classifiers using Euclidean distance or Dynamic Time Warping (DTW). Though conceptually simple, many have reported that NN classifiers are very diffi�cult to beat and new work is often compared to NN classifiers. The majority of approaches have focused on classification in the time domain, typically proposing alternative elastic similarity measures for NN classification. Other work has investigated more specialised approaches, such as building support vector machines on variable intervals and creating tree-based ensembles with summary measures. We wish to answer a specific research question: given a new TSC problem without any prior, specialised knowledge, what is the best way to approach the problem? Our thesis is that the best methodology is to first transform data into alternative representations where discriminatory features are more easily detected, and then build ensemble classifiers on each representation. In support of our thesis, we propose an elastic ensemble classifier that we believe is the first ever to significantly outperform DTW on the widely used UCR datasets. Next, we propose the shapelet-transform, a new data transformation that allows complex classifiers to be coupled with shapelets, which outperforms the original algorithm and is competitive with DTW. Finally, we combine these two works with with heterogeneous ensembles built on autocorrelation and spectral-transformed data to propose a collective of transformation-based ensembles (COTE). The results of COTE are, we believe, the best ever published on the UCR datasets.

[1]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[2]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[3]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[4]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[5]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6]  Laura J. Grundy,et al.  A database of C. elegans behavioral phenotypes , 2013, Nature Methods.

[7]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[8]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[9]  Luke M. Davis,et al.  Predictive modelling of bone ageing , 2013 .

[10]  Jonathan F. F. Hills,et al.  Mining time-series data using discriminative subsequences , 2014 .

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Gareth J. Janacek,et al.  Clustering time series from ARMA models with clipped data , 2004, KDD.

[13]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[14]  Jason Lines,et al.  Time series classification with ensembles of elastic distance measures , 2015, Data Mining and Knowledge Discovery.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[17]  Karl-Michael Schneider,et al.  A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering , 2003, EACL.

[18]  Olufemi A. Omitaomu,et al.  Weighted dynamic time warping for time series classification , 2011, Pattern Recognit..

[19]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  Barry-John Theobald,et al.  On the Segmentation and Classification of Hand Radiographs , 2012, Int. J. Neural Syst..

[22]  Miroslaw Bober,et al.  MPEG-7 visual shape descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[23]  Gareth J. Janacek,et al.  A Likelihood Ratio Distance Measure for the Similarity Between the Fourier Transform of Time Series , 2005, PAKDD.

[24]  Jason Lines,et al.  A shapelet transform for time series classification , 2012, KDD.

[25]  Qiang Wang,et al.  A symbolic representation of time series , 2005, Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005..

[26]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[27]  Jason Lines,et al.  Transformation Based Ensembles for Time Series Classification , 2012, SDM.

[28]  Hans-Peter Kriegel,et al.  Similarity Search on Time Series Based on Threshold Queries , 2006, EDBT.

[29]  Ambuj K. Singh,et al.  Variable length queries for time series data , 2001, Proceedings 17th International Conference on Data Engineering.

[30]  George C. Runger,et al.  A time series forest for classification and feature extraction , 2013, Inf. Sci..

[31]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[32]  Gareth J. Janacek,et al.  A Comparison of DWT/PAA and DFT for Time Series Classification , 2006, DMIN.

[33]  Eamonn J. Keogh,et al.  Time series shapelets: a novel technique that allows accurate, interpretable and fast classification , 2010, Data Mining and Knowledge Discovery.

[34]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[35]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[36]  Eamonn J. Keogh,et al.  Time Series Classification under More Realistic Assumptions , 2013, SDM.

[37]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[38]  Marcella Corduas,et al.  Time series clustering and classification by the autoregressive metric , 2008, Comput. Stat. Data Anal..

[39]  Sven-Oliver Proksch,et al.  A Scaling Model for Estimating Time-Series Party Positions from Texts , 2007 .

[40]  Gareth J. Janacek,et al.  A Run Length Transformation for Discriminating Between Auto Regressive Time Series , 2014, J. Classif..

[41]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[43]  Eamonn J. Keogh,et al.  Everything you know about Dynamic Time Warping is Wrong , 2004 .

[44]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[45]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[46]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[47]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[48]  Pierre-François Marteau,et al.  Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Jorge Caiado,et al.  A periodogram-based metric for time series classification , 2006, Comput. Stat. Data Anal..

[50]  George C. Runger,et al.  A Bag-of-Features Framework to Classify Time Series , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[52]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[53]  G. W. Brown,et al.  On Median Tests for Linear Hypotheses , 1951 .

[54]  Norbert Link,et al.  Gesture recognition with inertial sensors and optimized DTW prototypes , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[55]  F. Girosi,et al.  Nonlinear prediction of chaotic time series using support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[56]  Gautam Das,et al.  The Move-Split-Merge Metric for Time Series , 2013, IEEE Transactions on Knowledge and Data Engineering.

[57]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Small Sample Performance , 1952 .

[58]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.

[59]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[60]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[61]  Dimitrios Gunopulos,et al.  Finding Similar Time Series , 1997, PKDD.

[62]  Juan José Rodríguez Diez,et al.  Support vector machines of interval-based features for time series classification , 2004, Knowl. Based Syst..

[63]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[64]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[65]  Krisztian Buza,et al.  Fusion Methods for Time-Series Classification , 2011 .

[66]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[67]  Ulrich Eckhardt,et al.  Shape descriptors for non-rigid shapes with a single closed contour , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[68]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[69]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[70]  James M. Tanner,et al.  Assessment of skeletal maturity and prediction of adult height:(TW3 Method) , 2001 .

[71]  Lars Schmidt-Thieme,et al.  Invariant Time-Series Classification , 2012, ECML/PKDD.

[72]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[73]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[74]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[75]  Jason Lines,et al.  An Experimental Evaluation of Nearest Neighbour Time Series Classification , 2014, ArXiv.

[76]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[77]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[78]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[79]  Jason Lines,et al.  Alternative Quality Measures for Time Series Shapelets , 2012, IDEAL.

[80]  Fuzhen Zhuang,et al.  Fast Time Series Classification Based on Infrequent Shapelets , 2012, 2012 11th International Conference on Machine Learning and Applications.

[81]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[82]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[83]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[84]  Jason Lines,et al.  Classification of Household Devices by Electricity Usage Profiles , 2011, IDEAL.

[85]  Laura J. Grundy,et al.  A dictionary of behavioral motifs reveals clusters of genes affecting Caenorhabditis elegans locomotion , 2012, Proceedings of the National Academy of Sciences.

[86]  C. A. R. Hoare,et al.  Algorithm 65: find , 1961, Commun. ACM.

[87]  Barry-John Theobald,et al.  On the Extraction and Classification of Hand Outlines , 2011, IDEAL.

[88]  Nitesh V. Chawla,et al.  C4.5 and Imbalanced Data sets: Investigating the eect of sampling method, probabilistic estimate, and decision tree structure , 2003 .

[89]  James Durbin,et al.  The fitting of time series models , 1960 .

[90]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[91]  G. W. Hughes,et al.  Minimum Prediction Residual Principle Applied to Speech Recognition , 1975 .

[92]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[93]  Jason Lines,et al.  Classification of time series by shapelet transformation , 2013, Data Mining and Knowledge Discovery.

[94]  Nick S. Jones,et al.  Highly Comparative Feature-Based Time-Series Classification , 2014, IEEE Transactions on Knowledge and Data Engineering.

[95]  Jason Lines,et al.  Ensembles of Elastic Distance Measures for Time Series Classification , 2014, SDM.

[96]  Lars Schmidt-Thieme,et al.  Fusion of Similarity Measures for Time Series Classification , 2011, HAIS.

[97]  F. Tay,et al.  Application of support vector machines in financial time series forecasting , 2001 .

[98]  Tomasz Górecki,et al.  Using derivatives in time series classification , 2012, Data Mining and Knowledge Discovery.

[99]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[100]  Pierre Geurts,et al.  Pattern Extraction for Time Series Classification , 2001, PKDD.

[101]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.