论文信息 - A clustering method for very large mixed data sets

A clustering method for very large mixed data sets

In developed countries, especially over the last decade, there has been an explosive growth in the capability to generate, collect and use very large data sets. The objects of these data sets could be simultaneously described by quantitative and qualitative attributes. At present, algorithms able to process either very large data sets (in metric spaces) or mixed (qualitative and quantitative) incomplete data (missing value) sets have been developed, but not for very large mixed incomplete data sets. In this paper we introduce a new clustering method named GLC+ to process very large mixed incomplete data sets in order to obtain a partition in connected sets.

José Ruiz-Shulcloper | Guillermo Sánchez-Díaz

[1] Hans-Peter Kriegel,et al. Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[2] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[3] Vipin Kumar,et al. Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[4] José Ruiz-Shulcloper,et al. MID MINING: A LOGICAL COMBINATORIAL PATTERN RECOGNITION APPROACH TO CLUSTERING IN LARGE DATA SETS , 2000 .

[5] Tsau Young Lin,et al. Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA , 2001 .

[6] Jiawei Han,et al. Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.