A novel parallel scheme for fast similarity search in large time series

The similarity search is one of the fundamental components in time series data mining, e.g. clustering, classification, association rules mining. Many methods have been proposed to measure the similarity between time series, including Euclidean distance, Manhattan distance, and dynamic time warping (DTW). In contrast, DTW has been suggested to allow more robust similarity measure and be able to find the optimal alignment in time series. However, due to its quadratic time and space complexity, DTW is not suitable for large time series datasets. Many improving algorithms have been proposed for DTW search in large databases, such as approximate search or exact indexed search. Unlike the previous modified algorithm, this paper presents a novel parallel scheme for fast similarity search based on DTW, which is called MRDTW (MapRedcue-based DTW). The experimental results show that our approach not only retained the original accuracy as DTW, but also greatly improved the efficiency of similarity measure in large time series.

[1]  Georges Hébrail,et al.  Interactive Interpretation of Kohonen Maps Applied to Curves , 1998, KDD.

[2]  Tom Armstrong,et al.  Discovering Patterns in Real-Valued Time Series , 2006, PKDD.

[3]  Eugene Fink,et al.  Indexing of time series by major minima and maxima , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[4]  Martin Kampel,et al.  Identification of ancient coins based on fusion of shape and local features , 2011, Machine Vision and Applications.

[5]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[6]  Tele Tan,et al.  Classifying eye and head movement artifacts in EEG signals , 2011, 5th IEEE International Conference on Digital Ecosystems and Technologies (IEEE DEST 2011).

[7]  Eamonn J. Keogh,et al.  Scaling up Dynamic Time Warping to Massive Dataset , 1999, PKDD.

[8]  Stan Sclaroff,et al.  A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Eamonn J. Keogh,et al.  Iterative Deepening Dynamic Time Warping for Time Series , 2002, SDM.

[10]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[11]  Christos Faloutsos,et al.  FTW: fast similarity search under the time warping distance , 2005, PODS.

[12]  Tim Hawkins,et al.  Introduction to MongoDB , 2013 .

[13]  James R. Glass,et al.  An inner-product lower-bound estimate for dynamic time warping , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Eamonn J. Keogh,et al.  iSAX: disk-aware mining and indexing of massive time series datasets , 2009, Data Mining and Knowledge Discovery.

[15]  Mohammed Waleed Kadous,et al.  Learning Comprehensible Descriptions of Multivariate Time Series , 1999, ICML.

[16]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[17]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[18]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[19]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[20]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..