Two heads better than one: pattern discovery in time-evolving multi-aspect data

Data stream values are often associated with multiple aspects. For example each value observed at a given time-stamp from environmental sensors may have an associated type (e.g., temperature, humidity, etc.) as well as location. Time-stamp, type and location are the three aspects, which can be modeled using a tensor (high-order array). However, the time aspect is special, with a natural ordering, and with successive time-ticks having usually correlated values. Standard multiway analysis ignores this structure. To capture it, we propose 2 Heads Tensor Analysis (2-heads), which provides a qualitatively different treatment on time. Unlike most existing approaches that use a PCA-like summarization scheme for all aspects, 2-heads treats the time aspect carefully. 2-heads combines the power of classic multilinear analysis with wavelets, leading to a powerful mining tool. Furthermore, 2-heads has several other advantages as well: (a) it can be computed incrementally in a streaming fashion, (b) it has a provable error guarantee and, (c) it achieves significant compression ratio against competitors. Finally, we show experiments on real datasets, and we illustrate how 2-heads reveals interesting trends in the data. This is an extended abstract of an article published in the Data Mining and Knowledge Discovery journal.

[1]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[2]  Bülent Yener,et al.  Modeling and Multiway Analysis of Chatroom Tensors , 2005, ISI.

[3]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[4]  S. Muthukrishnan,et al.  One-Pass Wavelet Decompositions of Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Huan Liu,et al.  CubeSVD: a novel approach to personalized Web search , 2005, WWW '05.

[6]  Philip S. Yu,et al.  Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[8]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[9]  Jimeng Sun,et al.  Two Heads Better Than One: Pattern Discovery in Time-Evolving Multi-aspect Data , 2008, ECML/PKDD.

[10]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[11]  Christos Faloutsos,et al.  AWSOM: Adaptive, Hands-Off Stream Mining , 2003 .

[12]  Tamara G. Kolda,et al.  Cross-language information retrieval using PARAFAC2 , 2007, KDD '07.

[13]  William H. Press,et al.  Numerical recipes in C , 2002 .

[14]  Tamara G. Kolda,et al.  Categories and Subject Descriptors: G.4 [Mathematics of Computing]: Mathematical Software— , 2022 .

[15]  Tamara G. Kolda,et al.  Higher-order Web link analysis using multilinear algebra , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[16]  Christos Faloutsos,et al.  Adaptive, Hands-Off Stream Mining , 2003, VLDB.

[17]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[18]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[19]  Harry Shum,et al.  Concurrent subspaces analysis , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).