Parallelizing Dynamic Time Warping Algorithm Using Prefix Computations on GPU

The dynamic time warping (DTW) algorithm has O(n2) time complexity, which indicates that it is hard to process large-scale time series within an acceptable time. Recently, many researchers have used graphics processing units (GPUs) to accelerate the algorithm. Owing to the data dependence of DTW, however, most of existing GPU-based DTW algorithms exploits task-level parallelism by simply replicating the serial algorithm on every multiprocessor of a GPU. The fundamental issue with such coarse-grained parallelism is that the solvable problem size is severely limited by the share memory and cache of a GPU multiprocessor. In this study, we introduce a specific transformation of data dependence by using prefix computations. Further, we propose an efficient GPU-parallel DTW algorithm to address the problem of instance sizes limitation. The efficiency of our algorithm is validated through experiments, which demonstrate improved performance over existing GPU-based task-level parallel DTW algorithms. Our experimental results indicate speedups up to 99 times faster on NVIDIA GTX480, compared to CPU implementations.

[1]  Limin Xiao,et al.  A GPU-Accelerated Large-Scale Music Similarity Retrieval Method , 2013, 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing.

[2]  Eamonn J. Keogh,et al.  Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs , 2010, 2010 IEEE International Conference on Data Mining.

[3]  James R. Glass,et al.  Fast spoken query detection using lower-bound Dynamic Time Warping on Graphical Processing Units , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Eamonn J. Keogh,et al.  A Novel Approximation to Dynamic Time Warping allows Anytime Clustering of Massive Time Series Datasets , 2012, SDM.

[5]  G. W. Hughes,et al.  Minimum Prediction Residual Principle Applied to Speech Recognition , 1975 .

[6]  Jeremy Buhler,et al.  Rolling partial prefix-sums to speedup evaluation of uniform and affine recurrence equations , 2011, Defense + Commercial Sensing.

[7]  Edans Flavius de Oliveira Sandes,et al.  Smith-Waterman Alignment of Huge Sequences with GPU in Linear Space , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[8]  Tele Tan,et al.  Classifying eye and head movement artifacts in EEG signals , 2011, 5th IEEE International Conference on Digital Ecosystems and Technologies (IEEE DEST 2011).

[9]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[10]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[11]  Selim G. Akl,et al.  Design and analysis of parallel algorithms , 1985 .

[12]  Zhen Lin,et al.  An Effective Approach for Vocal Melody Extraction from Polyphonic Music on GPU , 2013, NPC.

[13]  Srinivas Aluru,et al.  Parallel biological sequence comparison using prefix computations , 2003, J. Parallel Distributed Comput..

[14]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[15]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[16]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[17]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[18]  Panagiotis Papapetrou,et al.  Benchmarking dynamic time warping for music retrieval , 2010, PETRA '10.

[19]  Yu Wang,et al.  Accelerating subsequence similarity search based on dynamic time warping distance with FPGA , 2013, FPGA '13.