Highly Comparative Feature-Based Time-Series Classification

A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large data sets containing long time series or time series of different lengths. For many of the data sets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the data set, insight that can guide further scientific investigation.

[1]  Pierre Geurts,et al.  Contributions to decision tree induction: bias/variance tradeoff and time series classification , 2002 .

[2]  E. K. Kemsley,et al.  FTIR spectroscopy and multivariate analysis can distinguish the geographic origin of extra virgin olive oils. , 2003, Journal of agricultural and food chemistry.

[3]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[4]  J. A. Stewart,et al.  Nonlinear Time Series Analysis , 2015 .

[5]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[6]  Max A. Little,et al.  Highly comparative time-series analysis: the empirical structure of time series and their methods , 2013, Journal of The Royal Society Interface.

[7]  Michael H. F. Wilkinson,et al.  Automatic diatom identification using contour analysis by morphological curvature scale spaces , 2005, Machine Vision and Applications.

[8]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[9]  F. Mörchen Time series feature extraction for data mining using DWT and DFT , 2003 .

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[13]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[14]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[15]  Eamonn J. Keogh,et al.  Everything you know about Dynamic Time Warping is Wrong , 2004 .

[16]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[17]  Dah-Jye Lee,et al.  Contour matching for a fish recognition and migration-monitoring system , 2004, SPIE Optics East.

[18]  Liang Wang,et al.  Structure-Based Statistical Features and Multivariate Time Series Clustering , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Constantin F. Aliferis,et al.  Causal Feature Selection , 2007 .

[21]  Davide Roverso MULTIVARIATE TEMPORAL CLASSIFICATION BY WINDOWED WAVELET DECOMPOSITION AND RECURRENT NEURAL NETWORKS , 2000 .

[22]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[23]  Li Wei,et al.  Semi-supervised time series classification , 2006, KDD '06.

[24]  R. Coifman,et al.  Local feature extraction and its applications using a library of bases , 1994 .

[25]  Oskar Söderkvist,et al.  Computer Vision Classification of Leaves from Swedish Trees , 2001 .

[26]  Yannis Manolopoulos,et al.  Feature-based classification of time-series data , 2001 .

[27]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[28]  Duc Truong Pham,et al.  Control chart pattern recognition using a new type of self-organizing neural network , 1998 .

[29]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[30]  Romain Briandet,et al.  Discrimination of Arabica and Robusta in Instant Coffee by Fourier Transform Infrared Spectroscopy and Chemometrics , 1996 .

[31]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[33]  Jens Timmer,et al.  Characteristics of hand tremor time series , 1993, Biological Cybernetics.

[34]  A. K. Jain,et al.  Data Clustering : A , 2007 .

[35]  George C. Runger,et al.  A time series forest for classification and feature extraction , 2013, Inf. Sci..

[36]  Andrew R. Post,et al.  Temporal data mining. , 2008, Clinics in laboratory medicine.

[37]  Nikos E. Mastorakis,et al.  Information processing and technology , 2001 .

[38]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[39]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[40]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[41]  Imre M. Jánosi,et al.  Book Review: "Nonlinear Time Series Analysis, 2nd Edition" by Holger Kantz and Thomas Schreiber , 2004 .

[42]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[43]  R. Manmatha,et al.  Word image matching using dynamic time warping , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[44]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[45]  T. Schreiber,et al.  Surrogate time series , 1999, chao-dyn/9909037.

[46]  Daniel P. Siewiorek,et al.  Generalized feature extraction for structural pattern recognition in time-series data , 2001 .

[47]  E. K. Kemsley,et al.  Detection of adulteration in cooked meat products by mid-infrared spectroscopy. , 2002, Journal of agricultural and food chemistry.

[48]  Hans C. van Houwelingen,et al.  The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[49]  Simon J. Perkins,et al.  Genetic Algorithms and Support Vector Machines for Time Series Classification , 2002, Optics + Photonics.

[50]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[51]  Kotagiri Ramamohanarao,et al.  Characteristic-Based Descriptors for Motion Sequence Recognition , 2008, PAKDD.

[52]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[53]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[54]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.