论文信息 - Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data

Market analysis is a representative data analysis process with many applications. In such an analysis, critical numerical measures, such as profit and sales, fluctuate over time and form time-series data. Moreover, the time series data correspond to market segments, which are described by a set of attributes, such as age, gender, education, income level, and product-category, that form a multi-dimensional structure. To better understand market dynamics and predict future trends, it is crucial to study the dynamics of time-series in multi-dimensional market segments. This is a topic that has been largely ignored in time series and data cube research. In this study, we examine the issues of anomaly detection in multi-dimensional time-series data. We propose time-series data cube to capture the multi-dimensional space formed by the attribute structure. This facilitates the detection of anomalies based on expected values derived from higher level, "more general" time-series. Anomaly detection in a time-series data cube poses computational challenges, especially for high-dimensional, large data sets. To this end, we also propose an efficient search algorithm to iteratively select subspaces in the original high-dimensional space and detect anomalies within each one. Our experiments with both synthetic and real-world data demonstrate the effectiveness and efficiency of the proposed solution.

Jiawei Han | Xiaolei Li | Jiawei Han | Xiaolei Li

[1] Christos Faloutsos,et al. FTW: fast similarity search under the time warping distance , 2005, PODS.

[2] Dennis Shasha,et al. Efficient elastic burst detection in data streams , 2003, KDD '03.

[3] Philip S. Yu,et al. Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[4] Raghu Ramakrishnan,et al. Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[5] Renée J. Miller,et al. Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[6] Huan Liu,et al. Subspace clustering for high dimensional data: a review , 2004, SKDD.

[7] Frederick Mosteller,et al. Understanding robust and exploratory data analysis , 1983 .

[8] Konstantinos Morfonios,et al. CURE for cubes: cubing using a ROLAP engine , 2006, VLDB.

[9] Dimitrios Gunopulos,et al. Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[10] Sunita Sarawagi,et al. i3: Intelligent, Interactive Investigaton of OLAP data cubes , 2000, SIGMOD Conference.

[11] RamakrishnanRaghu,et al. Bottom-up computation of sparse and Iceberg CUBE , 1999 .

[12] Ambuj K. Singh,et al. A unified framework for monitoring data streams in real time , 2005, 21st International Conference on Data Engineering (ICDE'05).

[13] Peter J. Rousseeuw,et al. Robust regression and outlier detection , 1987 .

[14] Jiawei Han,et al. Mining Compressed Frequent-Pattern Sets , 2005, VLDB.

[15] Yixin Chen,et al. Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[16] Sridhar Ramaswamy,et al. Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[17] Xintao Wu,et al. Using approximations to scale exploratory data analysis in datacubes , 1999, KDD '99.

[18] Eamonn J. Keogh,et al. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[19] Jian Pei,et al. Mining constrained gradients in large databases , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20] F. Mosteller,et al. Understanding robust and exploratory data analysis , 1985 .

[21] Eamonn J. Keogh,et al. HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[22] Christos Faloutsos,et al. LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[23] Leonid Khachiyan,et al. Cubegrades: Generalizing Association Rules , 2002, Data Mining and Knowledge Discovery.

[24] Alexander S. Szalay,et al. Very Fast Outlier Detection in Large Multidimensional Data Sets , 2002, DMKD.

[25] Philip S. Yu,et al. Outlier detection for high dimensional data , 2001, SIGMOD '01.

[26] Jiawei Han,et al. High-Dimensional OLAP: A Minimal Cubing Approach , 2004, VLDB.

[27] Steve B. Jiang,et al. Subsequence matching on structured time series data , 2005, SIGMOD '05.

[28] Eamonn J. Keogh,et al. Scaling and time warping in time series querying , 2005, The VLDB Journal.

[29] Nick Koudas,et al. Entropy based approximate querying and exploration of datacubes , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.