Parallelization of Similarity Search in Large Time Series Databases

In this paper, an efficient parallel algorithm to search large time series databases is proposed. There are existing parallel algorithms for performing such tasks, which generally utilize multidimensional tree structures and thus are subjected to the performance of multidimensional trees. On the other hand, there have been a number of serial algorithms proposed in the past decade. Most of them use certain transformation techniques to reduce the dimensionality and then build an index to facilitate the search process. This again results in performance degradation. This work develops a parallel algorithm to process range query and k-nearest neighbor query in parallel time series databases, assuming a shared nothing multi-processor architecture. Both analytical and experimental results show that the new approach has near linear scaleup and linear speedup with little more effort than non-index based sequential scan and thus another alternative to index based approach

[1]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[2]  Sharad Mehrotra,et al.  The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[3]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[4]  Hans-Peter Kriegel,et al.  Parallel processing of spatial joins using R-trees , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[5]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[6]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[7]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[8]  Padhraic Smyth,et al.  Deformable Markov model templates for time-series pattern matching , 2000, KDD '00.

[9]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  Rakesh Agrawal,et al.  Parallel Algorithms for High-Dimensional Proximity Joins , 1998 .

[11]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[12]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[13]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[14]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[15]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[16]  Changzhou Wang,et al.  Supporting content-based searches on time series via approximation , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.