Finding Maximal Significant Linear Representation between Long Time Series

In some applications on time series data, finding linear correlation between time series is important. However, it is meaningless to measure the global correlation between two long time series. Moreover, more often than not, two time series may be correlated in various segments. To tackle the challenges in measuring linear correlation between two long time series, in this paper, we formulate the novel problem of finding maximal significant linear representation. The major idea is that, given two time series and a quality constraint, we want to find the longest gapped time interval on which a time series can be linearly represented by the other within the quality constraint requirement. We develop a point-based approach, which exploits a novel representation of linear correlation between time series on segments, and transforms the problem into geometric search. We present a systematic empirical study to verify its efficiency and effectiveness.

[1]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[2]  Trilce Estrada,et al.  Time Series Join on Subsequence Correlation , 2014, 2014 IEEE International Conference on Data Mining.

[3]  Christos Faloutsos,et al.  AutoPlait: automatic mining of co-evolving time sequences , 2014, SIGMOD Conference.

[4]  Man Lung Yiu,et al.  Discovering Longest-lasting Correlation in Sequence Databases , 2013, Proc. VLDB Endow..

[5]  Godfried T. Toussaint,et al.  A simple linear algorithm for intersecting convex polygons , 1985, The Visual Computer.

[6]  Haixun Wang,et al.  Finding semantics in time series , 2011, SIGMOD '11.

[7]  Eamonn J. Keogh,et al.  iSAX 2.0: Indexing and Mining One Billion Time Series , 2010, 2010 IEEE International Conference on Data Mining.

[8]  Xiang Zhang,et al.  CARE: Finding Local Linear Correlations in High Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[9]  Jian Pei,et al.  A Data-adaptive and Dynamic Segmentation Index for Whole Matching on Time Series , 2013, Proc. VLDB Endow..

[10]  Philip S. Yu,et al.  Local Correlation Tracking in Time Series , 2006, Sixth International Conference on Data Mining (ICDM'06).

[11]  Torben Bach Pedersen,et al.  Time Series Management Systems: A Survey , 2017, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jie Liu,et al.  Fast approximate correlation for massive time-series data , 2010, SIGMOD Conference.

[13]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[14]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[15]  Haixun Wang,et al.  An algorithmic approach to event summarization , 2010, SIGMOD Conference.