Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles

Recently, two ideas have been explored that lead to more accurate algorithms for time-series classification (TSC). First, it has been shown that the simplest way to gain improvement on TSC problems is to transform into an alternative data space where discriminatory features are more easily detected. Second, it was demonstrated that with a single data representation, improved accuracy can be achieved through simple ensemble schemes. We combine these two principles to test the hypothesis that forming a collective of ensembles of classifiers on different data transformations improves the accuracy of time-series classification. The collective contains classifiers constructed in the time, frequency, change, and shapelet transformation domains. For the time domain, we use a set of elastic distance measures. For the other domains, we use a range of standard classifiers. Through extensive experimentation on 72 datasets, including all of the 46 UCR datasets, we demonstrate that the simple collective formed by including all classifiers in one ensemble is significantly more accurate than any of its components and any other previously published TSC algorithm. We investigate alternative hierarchical collective structures and demonstrate the utility of the approach on a new problem involving classifying Caenorhabditis elegans mutant types.

[1]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[2]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[3]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[4]  Laura J. Grundy,et al.  A dictionary of behavioral motifs reveals clusters of genes affecting Caenorhabditis elegans locomotion , 2012, Proceedings of the National Academy of Sciences.

[5]  Jorge Caiado,et al.  A periodogram-based metric for time series classification , 2006, Comput. Stat. Data Anal..

[6]  George C. Runger,et al.  A Bag-of-Features Framework to Classify Time Series , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Gautam Das,et al.  The Move-Split-Merge Metric for Time Series , 2013, IEEE Transactions on Knowledge and Data Engineering.

[8]  Lars Schmidt-Thieme,et al.  Learning time-series shapelets , 2014, KDD.

[9]  Jason Lines,et al.  Transformation Based Ensembles for Time Series Classification , 2012, SDM.

[10]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[13]  Vinícius M. A. de Souza,et al.  Time Series Classification Using Compression Distance of Recurrence Plots , 2013, 2013 IEEE 13th International Conference on Data Mining.

[14]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[15]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[16]  Jason Lines,et al.  An Experimental Evaluation of Nearest Neighbour Time Series Classification , 2014, ArXiv.

[17]  Gareth J. Janacek,et al.  Clustering time series from ARMA models with clipped data , 2004, KDD.

[18]  Jason Lines,et al.  Time series classification with ensembles of elastic distance measures , 2015, Data Mining and Knowledge Discovery.

[19]  Olufemi A. Omitaomu,et al.  Weighted dynamic time warping for time series classification , 2011, Pattern Recognit..

[20]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[21]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[22]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[23]  Jason Lines,et al.  Classification of time series by shapelet transformation , 2013, Data Mining and Knowledge Discovery.

[24]  Nick S. Jones,et al.  Highly Comparative Feature-Based Time-Series Classification , 2014, IEEE Transactions on Knowledge and Data Engineering.

[25]  Jason Lines,et al.  A shapelet transform for time series classification , 2012, KDD.

[26]  Eamonn J. Keogh,et al.  CID: an efficient complexity-invariant distance for time series , 2013, Data Mining and Knowledge Discovery.

[27]  Eamonn J. Keogh,et al.  Time series shapelets: a novel technique that allows accurate, interpretable and fast classification , 2010, Data Mining and Knowledge Discovery.

[28]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[29]  Marcella Corduas,et al.  Time series clustering and classification by the autoregressive metric , 2008, Comput. Stat. Data Anal..

[30]  Gareth J. Janacek,et al.  A Run Length Transformation for Discriminating Between Auto Regressive Time Series , 2014, J. Classif..

[31]  George C. Runger,et al.  A time series forest for classification and feature extraction , 2013, Inf. Sci..

[32]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[33]  Laura J. Grundy,et al.  A database of C. elegans behavioral phenotypes , 2013, Nature Methods.

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Pierre-François Marteau,et al.  Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.