Unsupervised clustering on dynamic databases

Clustering algorithms typically assume that the available data constitute a random sample from a stationary distribution. As data accumulate over time the underlying process that generates them can change. Thus, the development of algorithms that can extract clustering rules in non-stationary environments is necessary. In this paper, we present an extension of the k-windows algorithm that can track the evolution of cluster models in dynamically changing databases, without a significant computational overhead. Experiments show that the k-windows algorithm can effectively and efficiently identify the changes on the pattern structure.

[1]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[2]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[3]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[4]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[5]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[6]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[7]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[8]  Olfa Nasraoui,et al.  From Static to Dynamic Web Usage Mining : Towards Scalable Profiling and Personalization with Evolutionary Computation , 2003 .

[9]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[10]  Jeffrey Scott Vitter,et al.  Bkd-Tree: A Dznamic Scalable kd-Tree , 2003, SSTD.

[11]  Kuldip K. Paliwal,et al.  Fast K-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding , 1992, IEEE Trans. Signal Process..

[12]  Dimitris K. Tasoulis,et al.  Parallelizing the Unsupervised k-Windows Clustering Algorithm , 2003, PPAM.

[13]  Daniela Rus,et al.  A practical clustering algorithm for static and dynamic information organization , 1999, SODA '99.

[14]  Bernard Chazelle Filtering Search: A New Approach to Query-Answering , 1983, FOCS.

[15]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[16]  Martin Ester,et al.  Incremental Generalization for Mining in a Data Warehousing Environment , 1998, EDBT.

[17]  Dimitris K. Tasoulis,et al.  Improving the orthogonal range search k-windows algorithm , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[18]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[19]  Michael N. Vrahatis,et al.  The New k-Windows Algorithm for Improving the k-Means Clustering Algorithm , 2002, J. Complex..

[20]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[21]  Aimo A. Törn,et al.  Global Optimization , 1999, Science.

[22]  Dimitris K. Tasoulis,et al.  Parallel Unsupervised k-Windows: An Efficient Parallel Clustering Algorithm , 2003, PaCT.

[23]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[24]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[25]  Leonidas J. Guibas,et al.  Fractional cascading: II. Applications , 1986, Algorithmica.

[26]  Fazli Can,et al.  Incremental clustering for dynamic information processing , 1993, TOIS.

[27]  Betty Salzberg,et al.  Back to the future: dynamic hierarchical clustering , 1998, Proceedings 14th International Conference on Data Engineering.

[28]  Rajeev Motwani,et al.  Incremental Clustering and Dynamic Information Retrieval , 2004, SIAM J. Comput..

[29]  Dimitris K. Tasoulis,et al.  Unsupervised distributed clustering , 2004, Parallel and Distributed Computing and Networks.

[30]  Hermann A. Maurer,et al.  Efficient worst-case data structures for range searching , 1978, Acta Informatica.

[31]  Hans-Peter Kriegel,et al.  Incremental OPTICS: Efficient Computation of Updates in a Hierarchical Cluster Ordering , 2003, DaWaK.

[32]  David Wai-Lok Cheung,et al.  A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.

[33]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.