论文信息 - Outlier Detection with Streaming Dyadic Decomposition

Outlier Detection with Streaming Dyadic Decomposition

In this work we introduce a new algorithm for detecting outliers on streaming data in Rn. The basic idea is to compute a dyadic decomposition into cubes in Rn of the streaming data. Dyadic decomposition can be obtained by recursively bisecting the cube the data lies in. Dyadic decomposition obtained under streaming setting is understood as streaming dyadic decomposition. If we view the streaming dyadic decomposition as a tree with a fixed maximum (and sufficient) size (depth), then outliers are naturally defined by cubes that contain a small number of points in the cube itself or the cube itself and its neighboring cubes. We discuss some properties of detecting outliers with streaming dyadic decomposition and we present experimental results over real and artificial data sets.

Robert L. Grossman | Chetan Gupta | R. Grossman | Chetan Gupta

[1] Erich Schikuta,et al. Grid-clustering: an efficient hierarchical clustering method for very large data sets , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[2] Ping Chen,et al. Using the fractal dimension to cluster datasets , 2000, KDD '00.

[3] Sridhar Ramaswamy,et al. Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[4] Charles Elkan,et al. Scalability for clustering algorithms revisited , 2000, SKDD.

[5] Raymond T. Ng,et al. Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[6] Jeffrey Scott Vitter,et al. Mining deviants in time series data streams , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[7] J. Cooper. SINGULAR INTEGRALS AND DIFFERENTIABILITY PROPERTIES OF FUNCTIONS , 1973 .

[8] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[9] E. Schikuta. GRID-CLUSTERING: A FAST HIERARCHICAL CLUSTERING METHOD FOR VERY LARGE DATA SETS , 1993 .

[10] Sudipto Guha,et al. Clustering Data Streams , 2000, FOCS.

[11] Yee Leung,et al. Clustering by Scale-Space Filtering , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12] Vipin Kumar,et al. Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[13] Stephen J. Roberts,et al. Parametric and non-parametric unsupervised cluster analysis , 1997, Pattern Recognit..

[14] G. Krishna,et al. A heuristic clustering algorithm using union of overlapping pattern-cells , 1979, Pattern Recognit..

[15] Jiawei Han,et al. Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[16] Paul S. Bradley,et al. Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[17] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[18] Aidong Zhang,et al. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.