Efficient Pattern-Based Time Series Classification on GPU

Time series shapelet discovery algorithm finds subsequences from a set of time series for use as primitives for time series classification. This algorithm has drawn a lot of interest because of the interpretability of its results. However, computation requirements restrict the algorithm from dealing with large data sets and may limit its application in many domains. In this paper, we address this issue by redesigning the algorithm for implementation on highly parallel Graphics Process Units (GPUs). We investigate several concepts of GPU programming and propose a dynamic programming algorithm that is suitable for implementation on GPUs. Results show that the proposed GPU implementation significantly reduces the running time of the shapelet discovery algorithm. For example, on the largest sample dataset from the original authors, the running time is reduced from half a day to two minutes.

[1]  Eamonn J. Keogh,et al.  Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs , 2010, 2010 IEEE International Conference on Data Mining.

[2]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[4]  Nicole Krämer,et al.  Partial least squares regression for graph mining , 2008, KDD.

[5]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[6]  Jiawei Han,et al.  Classification of software behaviors for failure detection: a discriminative pattern mining approach , 2009, KDD.

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  Nathan Srebro,et al.  A GPU-tailored approach for training kernelized SVMs , 2011, KDD.

[9]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[10]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[11]  Ying Zhang,et al.  Program Optimization of Stencil Based Application on the GPU-Accelerated System , 2009, 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[12]  Feng Yan,et al.  Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units , 2009, NIPS.

[13]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[14]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[15]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[16]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[17]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Wen-mei W. Hwu,et al.  GPU computing gems , 2011 .