Time Series Join on Subsequence Correlation

We consider the problem of joining two long time series based on their most correlated segments. Two time series can be joined at any locations and for arbitrary length. Such join locations and length provide useful knowledge about the synchrony of the two time series and have applications in many domains including environmental monitoring, patient monitoring and power monitoring. However, join on correlation is a computationally expensive task, specially when the time series are large. The naive algorithm requires O (n4) computation where n is the length of the time series. We propose an algorithm, named Jocor, that uses two algorithmic techniques to tackle the complexity. First, the algorithm reuses the computation by caching sufficient statistics and second, the algorithm prunes unnecessary correlation computation by admissible heuristics. The algorithm runs orders of magnitude faster than the naive algorithm and enables us to join long time series as well as many small time series. We propose a variant of Jocor for fast approximation and an extension to a GPU-based parallel method to bring down the running-time to interactive level for analytics applications. We show three independent uses of time series join on correlation which are made possible by our algorithm.

[1]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[2]  Rudnick,et al.  Compensation of horizontal temperature and salinity gradients in the ocean mixed layer , 1999, Science.

[3]  Abdullah Mueen,et al.  Enumeration of time series motifs of all lengths , 2013, 2013 IEEE 13th International Conference on Data Mining.

[4]  Jason Lines,et al.  A shapelet transform for time series classification , 2012, KDD.

[5]  Fred Popowich,et al.  AMPds: A public dataset for load disaggregation and eco-feedback research , 2013, 2013 IEEE Electrical Power & Energy Conference.

[6]  Man Lung Yiu,et al.  Discovering Longest-lasting Correlation in Sequence Databases , 2013, Proc. VLDB Endow..

[7]  Abhinandan Das,et al.  Approximate join processing over data streams , 2003, SIGMOD '03.

[8]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[9]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[10]  Christos Faloutsos,et al.  BRAID: stream mining through group lag correlations , 2005, SIGMOD '05.

[11]  Ali A. Ghorbani,et al.  Time Series Motif Discovery and Anomaly Detection Based on Subseries Join , 2010 .

[12]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[13]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[14]  Pavlos Protopapas,et al.  Finding anomalous periodic time series , 2009, Machine Learning.

[15]  Eamonn J. Keogh,et al.  Dot plots for time series analysis , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[16]  Jie Liu,et al.  Fast approximate correlation for massive time-series data , 2010, SIGMOD Conference.

[17]  Jun Yang,et al.  A Survey of Join Processing in Data Streams , 2007, Data Streams - Models and Algorithms.

[18]  Gang Chen,et al.  Efficient Processing of Warping Time Series Join of Motion Capture Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[19]  Daniel A Beard,et al.  Identifying physiological origins of baroreflex dysfunction in salt-sensitive hypertension in the Dahl SS rat. , 2010, Physiological genomics.

[20]  Yi Lin,et al.  Subseries Join: A Similarity-Based Time Series Match Approach , 2010, PAKDD.