The BOSS is concerned with time series classification in the presence of noise

Similarity search is one of the most important and probably best studied methods for data mining. In the context of time series analysis it reaches its limits when it comes to mining raw datasets. The raw time series data may be recorded at variable lengths, be noisy, or are composed of repetitive substructures. These build a foundation for state of the art search algorithms. However, noise has been paid surprisingly little attention to and is assumed to be filtered as part of a preprocessing step carried out by a human. Our Bag-of-SFA-Symbols (BOSS) model combines the extraction of substructures with the tolerance to extraneous and erroneous data using a noise reducing representation of the time series. We show that our BOSS ensemble classifier improves the best published classification accuracies in diverse application areas and on the official UCR classification benchmark datasets by a large margin.

[1]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[2]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[3]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[4]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[5]  Anthony Bagnall,et al.  A Shapelet Transform for Time Series Classification , 2015 .

[6]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[7]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[8]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[9]  Jason Lines,et al.  Transformation Based Ensembles for Time Series Classification , 2012, SDM.

[10]  Eamonn J. Keogh,et al.  Time Series Classification under More Realistic Assumptions , 2013, SDM.

[11]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Ian G. Cumming,et al.  The momentary Fourier transformation derived from recursive matrix transformations , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[13]  Eamonn J. Keogh,et al.  Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  Sergey Malinchik,et al.  SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model , 2013, 2013 IEEE 13th International Conference on Data Mining.

[15]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[16]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[17]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[18]  Yunhao Liu,et al.  Indexable PLA for Efficient Similarity Search , 2007, VLDB.

[19]  Eamonn J. Keogh,et al.  Time series shapelets: a novel technique that allows accurate, interpretable and fast classification , 2010, Data Mining and Knowledge Discovery.

[20]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[21]  Yuan Li,et al.  Rotation-invariant similarity in time series using bag-of-patterns representation , 2012, Journal of Intelligent Information Systems.

[22]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[23]  Patrick Schäfer,et al.  SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets , 2012, EDBT '12.

[24]  Nitin Kumar,et al.  Time-series Bitmaps: a Practical Visualization Tool for Working with Large Time Series Databases , 2005, SDM.