Tensor Analysis on Multi-aspect Streams

Data stream values are often associated with multiple aspects. For example, each value from environmental sensors may have an associated type (e.g., temperature, humidity, etc.) as well as location. Aside from time stamp, type and location are the two additional aspects. How to model such streams? How to simultaneously find patterns within and across the multiple aspects? How to do it incrementally in a streaming fashion? In this paper, all these problems are addressed through a general data model, tensor streams, and an effective algorithmic framework, window-based tensor analysis (WTA). Two variations of WTA, independent-window tensor analysis (IW) and moving-window tensor analysis (MW), are presented and evaluated extensively on real data sets. Finally, we illustrate one important application, Multi-Aspect Correlation Analysis (MACA), which uses WTA and we demonstrate its effectiveness on an environmental monitoring application.

[1]  Deng Cai,et al.  Tensor Subspace Analysis , 2005, NIPS.

[2]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[3]  Srikanta Tirthapura,et al.  Distributed Streams Algorithms for Sliding Windows , 2002, SPAA '02.

[4]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[5]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[6]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[7]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[8]  Philip S. Yu,et al.  Integrating DCT and DWT for approximating cube streams , 2005, CIKM '05.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Søren Balling Engelsen,et al.  Monitoring Thermal Processes by NMR Technology , 2005 .

[11]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[12]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[13]  L. Lathauwer,et al.  Signal Processing based on Multilinear Algebra , 1997 .

[14]  Rajeev Motwani,et al.  Sampling from a moving window over streaming data , 2002, SODA '02.

[15]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[16]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[17]  F. Yates The analysis of replicated experiments when the field results are incomplete , 1933 .

[18]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[19]  Gene H. Golub,et al.  Matrix computations , 1983 .

[20]  Tamara G. Kolda,et al.  Higher-order Web link analysis using multilinear algebra , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[21]  Jimeng Sun,et al.  Distributed Pattern Discovery in Multiple Streams , 2006, PAKDD.

[22]  H. Neudecker,et al.  An approach ton-mode components analysis , 1986 .

[23]  Jieping Ye,et al.  Generalized Low Rank Approximations of Matrices , 2005, Machine Learning.

[24]  Chris H. Q. Ding,et al.  2-Dimensional Singular Value Decomposition for 2D Maps and Images , 2005, SDM.

[25]  Lap-Kei Lee,et al.  Maintaining significant stream statistics over sliding windows , 2006, SODA '06.

[26]  Eamonn J. Keogh,et al.  Atomic wedgie: efficient query filtering for streaming time series , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[27]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.