Low Redundancy Estimation of Correlation Matrices for Time Series Using Triangular Bounds

The dramatic increase in the availability of large collections of time series requires new approaches for scalable time series analysis. Correlation analysis for all pairs of time series is a fundamental first step of analysis of such data but is particularly hard for large collections of time series due to its quadratic complexity. State-of-the-art approaches focus on efficiently approximating correlations larger than a hard threshold or compressing fully computed correlation matrices in hindsight. In contrast, we aim at estimates for the full pairwise correlation structure without computing and storing all pairwise correlations. We introduce the novel problem of low redundancy estimation for correlation matrices to capture the complete correlation structure with as few parameters and correlation computations as possible. We propose a novel estimation algorithm that is very efficient and comes with formal approximation guarantees. Our algorithm avoids the computation of redundant blocks in the correlation matrix to drastically reduce time and space complexity of estimation. We perform an extensive empirical evaluation of our approach and show that we obtain high-quality estimates with drastically reduced space requirements on a large variety of datasets.

[1]  Olivier Ledoit,et al.  Improved estimation of the covariance matrix of stock returns with an application to portfolio selection , 2003 .

[2]  Marc Wiedermann,et al.  A climate network‐based index to discriminate different types of El Niño and La Niña , 2016, 1604.04432.

[3]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[4]  Dominik Wied,et al.  Monitoring correlation change in a sequence of random variables , 2013 .

[5]  Philip S. Yu,et al.  Local Correlation Tracking in Time Series , 2006, Sixth International Conference on Data Mining (ICDM'06).

[6]  Klemens Böhm,et al.  Estimating mutual information on data streams , 2015, SSDBM.

[7]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2015, SIGMOD Conference.

[8]  Eamonn J. Keogh,et al.  Scalable Clustering of Time Series with U-Shapelets , 2015, SDM.

[9]  Richard Cole,et al.  Fast window correlations over uncooperative time series , 2005, KDD '05.

[10]  K. Didan,et al.  MOD13C1 MODIS/Terra Vegetation Indices 16-Day L3 Global 0.05Deg CMG V006 , 2015 .

[11]  Hui Xiong,et al.  TAPER: a two-step approach for all-strong-pairs correlation query in large databases , 2006 .

[12]  Philip S. Yu,et al.  Detecting Leaders from Correlated Time Series , 2010, DASFAA.

[13]  Mark W. Woolrich,et al.  Network modelling methods for FMRI , 2011, NeuroImage.

[14]  Zhenyue Zhang,et al.  Optimal low-rank approximation to a correlation matrix , 2003 .

[15]  Qing Xie,et al.  Local correlation detection with linearity enhancement in streaming data , 2013, CIKM.

[16]  Karl Aberer,et al.  Fast Distributed Correlation Discovery Over Streaming Time-Series Data , 2015, CIKM.

[17]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[18]  Jie Liu,et al.  Fast approximate correlation for massive time-series data , 2010, SIGMOD Conference.

[19]  E. Langford,et al.  Is the Property of Being Positively Correlated Transitive? , 2001 .

[20]  Christos Faloutsos,et al.  BRAID: stream mining through group lag correlations , 2005, SIGMOD '05.

[21]  Dimitrios Gunopulos,et al.  Correlating synchronous and asynchronous data streams , 2003, KDD '03.

[22]  Karl Aberer,et al.  AFFINITY: Efficiently querying statistical measures on time-series data , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[23]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[24]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.