Logical-shapelets: an expressive primitive for time series classification

Time series shapelets are small, local patterns in a time series that are highly predictive of a class and are thus very useful features for building classifiers and for certain visualization and summarization tasks. While shapelets were introduced only recently, they have already seen significant adoption and extension in the community. Despite their immense potential as a data mining primitive, there are two important limitations of shapelets. First, their expressiveness is limited to simple binary presence/absence questions. Second, even though shapelets are computed offline, the time taken to compute them is significant. In this work, we address the latter problem by introducing a novel algorithm that finds shapelets in less time than current methods by an order of magnitude. Our algorithm is based on intelligent caching and reuse of computations, and the admissible pruning of the search space. Because our algorithm is so fast, it creates an opportunity to consider more expressive shapelet queries. In particular, we show for the first time an augmented shapelet representation that distinguishes the data based on conjunctions or disjunctions of shapelets. We call our novel representation Logical-Shapelets. We demonstrate the efficiency of our approach on the classic benchmark datasets used for these problems, and show several case studies where logical shapelets significantly outperform the original shapelet representation and other time series classification techniques. We demonstrate the utility of our ideas in domains as diverse as gesture recognition, robotics, and biometrics.

[1]  Philip S. Yu,et al.  Extracting Interpretable Features for Early Classification on Time Series , 2011, SDM.

[2]  Jie Liu,et al.  Fast approximate correlation for massive time-series data , 2010, SIGMOD Conference.

[3]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[4]  Pedro M. Domingos Process-Oriented Estimation of Generalization Error , 1999, IJCAI.

[5]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[6]  Norbert Link,et al.  Gesture recognition with inertial sensors and optimized DTW prototypes , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[7]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[8]  Zhen Wang,et al.  uWave: Accelerometer-based Personalized Gesture Recognition and Its Applications , 2009, PerCom.

[9]  S. Venkatesh,et al.  Online Context Recognition in Multisensor Systems using Dynamic Time Warping , 2005, 2005 International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[10]  Amy McGovern,et al.  Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction , 2010, Data Mining and Knowledge Discovery.

[11]  Christos Faloutsos,et al.  BRAID: stream mining through group lag correlations , 2005, SIGMOD '05.

[12]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[13]  Manuela Veloso,et al.  Learning from accelerometer data on a legged robot , 2004 .

[14]  Norbert Link,et al.  Prototype Optimization for Temporarily and Spatially Distorted Time Series , 2010, AAAI Spring Symposium: It's All in the Timing.

[15]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[16]  B. Malek,et al.  Novel Shoulder-Surfing Resistant Haptic-based Graphical Password , 2006 .