Towards Time Series Classification without Human Preprocessing

Similarity search is a core functionality in many data mining algorithms. Over the past decade these algorithms were designed to mostly work with human assistance to extract characteristic, aligned patterns of equal length and scaling. Human assistance is not cost-effective. We propose our shotgun distance similarity metric that extracts, scales, and aligns segments from a query to a sample time series. This simplifies the classification of time series as produced by sensors. A time series is classified based on its segments at varying lengths as part of our shotgun ensemble classifier. It improves the best published accuracies on case studies in the context of bioacoustics, human motion detection, spectrographs or personalized medicine. Finally, it performs better than state of the art on the official UCR classification benchmark.

[1]  Eamonn J. Keogh,et al.  Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[2]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[3]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[4]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[5]  Patrick Schäfer,et al.  SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets , 2012, EDBT '12.

[6]  Yunhao Liu,et al.  Indexable PLA for Efficient Similarity Search , 2007, VLDB.

[7]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[8]  Patrick Schäfer,et al.  Shooting Audio Recordings of Insects with SFA , 2013 .

[9]  Thomas Gottron,et al.  Alignment of Noisy and Uniformly Scaled Time Series , 2009, DEXA.

[10]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[11]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[12]  Eamonn J. Keogh,et al.  Time series shapelets: a novel technique that allows accurate, interpretable and fast classification , 2010, Data Mining and Knowledge Discovery.

[13]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[14]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[15]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[16]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[17]  Jason Lines,et al.  Transformation Based Ensembles for Time Series Classification , 2012, SDM.

[18]  Eamonn J. Keogh,et al.  Time Series Classification under More Realistic Assumptions , 2013, SDM.

[19]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[20]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[21]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[22]  Olufemi A. Omitaomu,et al.  Weighted dynamic time warping for time series classification , 2011, Pattern Recognit..

[23]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[24]  David B. Lomet,et al.  Foundations of Data Organization and Algorithms , 1993, Lecture Notes in Computer Science.