Efficient Bottom-Up Discovery of Multi-scale Time Series Correlations Using Mutual Information

Recent developments in computing and IoT technology have enabled the daily generation of enormous amounts of time series data. These time series have to be analyzed to create value. A fundamental type of analysis is to find temporal correlations between given sets of time series. To provide a robust method for solving this problem, several properties are desirable. First, the method should have a strong theoretical foundation. Second, since temporal correlations can occur at different temporal scales, e.g., sub-second versus weekly, it is important that the method is capable of discovering multitemporal scale correlations. Finally, the method should be efficient and scalable. This paper presents an approach to search for synchronous correlations in big time series that displays all three properties: the proposed method (i) utilizes the metric of mutual information from information theory, providing a strong theoretical foundation, (ii) is able to discover correlations at multiple temporal scales, and (iii) works in an efficient, bottom-up fashion, making it scalable to large datasets. Our experiments verify that the proposed approach can identify various types of correlation relations across multiple temporal scales, while achieving a performance of an order of magnitude faster than the state-of-the-art techniques.

[1]  Barbara Pernici,et al.  A data-value-driven adaptation framework for energy efficiency for data intensive applications in clouds , 2015, 2015 IEEE Conference on Technologies for Sustainability (SusTech).

[2]  Roger Y. Anderson,et al.  Application of some correlation coefficient techniques to time-series analysis , 1974 .

[3]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[4]  Gary Chamberlain,et al.  Analysis of Covariance with Qualitative Data , 1979 .

[5]  Torben Bach Pedersen,et al.  Efficient Sentinel Mining Using Bitmaps on Modern Processors , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Edmund K. Burke,et al.  The late acceptance Hill-Climbing heuristic , 2017, Eur. J. Oper. Res..

[7]  Yan Huang,et al.  Correlation Analysis of Spatial Time Series Datasets: A Filter-and-Refine Approach , 2003, PAKDD.

[8]  Dan Suciu,et al.  A formal approach to finding explanations for database queries , 2014, SIGMOD Conference.

[9]  Reynold Xin,et al.  Finding related tables , 2012, SIGMOD Conference.

[10]  Zhongming Zhao,et al.  Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes , 2010, BMC Bioinformatics.

[11]  Mai H. Vu,et al.  An adaptive information-theoretic approach for identifying temporal correlations in big data sets , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[12]  Juliana Freire,et al.  Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets , 2016, SIGMOD Conference.