Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data

Market analysis is a representative data analysis process with many applications. In such an analysis, critical numerical measures, such as profit and sales, fluctuate over time and form time-series data. Moreover, the time series data correspond to market segments, which are described by a set of attributes, such as age, gender, education, income level, and product-category, that form a multi-dimensional structure. To better understand market dynamics and predict future trends, it is crucial to study the dynamics of time-series in multi-dimensional market segments. This is a topic that has been largely ignored in time series and data cube research. In this study, we examine the issues of anomaly detection in multi-dimensional time-series data. We propose time-series data cube to capture the multi-dimensional space formed by the attribute structure. This facilitates the detection of anomalies based on expected values derived from higher level, "more general" time-series. Anomaly detection in a time-series data cube poses computational challenges, especially for high-dimensional, large data sets. To this end, we also propose an efficient search algorithm to iteratively select subspaces in the original high-dimensional space and detect anomalies within each one. Our experiments with both synthetic and real-world data demonstrate the effectiveness and efficiency of the proposed solution.

[1]  Christos Faloutsos,et al.  FTW: fast similarity search under the time warping distance , 2005, PODS.

[2]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[3]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[4]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[5]  Renée J. Miller,et al.  Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[7]  Frederick Mosteller,et al.  Understanding robust and exploratory data analysis , 1983 .

[8]  Konstantinos Morfonios,et al.  CURE for cubes: cubing using a ROLAP engine , 2006, VLDB.

[9]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[10]  Sunita Sarawagi,et al.  i3: Intelligent, Interactive Investigaton of OLAP data cubes , 2000, SIGMOD Conference.

[11]  RamakrishnanRaghu,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999 .

[12]  Ambuj K. Singh,et al.  A unified framework for monitoring data streams in real time , 2005, 21st International Conference on Data Engineering (ICDE'05).

[13]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[14]  Jiawei Han,et al.  Mining Compressed Frequent-Pattern Sets , 2005, VLDB.

[15]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[16]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[17]  Xintao Wu,et al.  Using approximations to scale exploratory data analysis in datacubes , 1999, KDD '99.

[18]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[19]  Jian Pei,et al.  Mining constrained gradients in large databases , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20]  F. Mosteller,et al.  Understanding robust and exploratory data analysis , 1985 .

[21]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[22]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[23]  Leonid Khachiyan,et al.  Cubegrades: Generalizing Association Rules , 2002, Data Mining and Knowledge Discovery.

[24]  Alexander S. Szalay,et al.  Very Fast Outlier Detection in Large Multidimensional Data Sets , 2002, DMKD.

[25]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[26]  Jiawei Han,et al.  High-Dimensional OLAP: A Minimal Cubing Approach , 2004, VLDB.

[27]  Steve B. Jiang,et al.  Subsequence matching on structured time series data , 2005, SIGMOD '05.

[28]  Eamonn J. Keogh,et al.  Scaling and time warping in time series querying , 2005, The VLDB Journal.

[29]  Nick Koudas,et al.  Entropy based approximate querying and exploration of datacubes , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.