Core-Based Clustering Techniques

Starting from model-based clustering simple techniques based on cores are proposed. A core is a dense region in the high-dimensional space that, for example, can be represented by its most typical observation, by its centroid or, more generally, by assigning weight functions to the observations. Well-known cluster analysis techniques like the partitional K-Means or the hierarchical Ward are useful for discovering partitions or hierarchies in the underlying data. Here these methods are generalised in two ways, firstly by using weighted observations and secondly by allowing different volumes of clusters. Then a more general K-Means approach based on pair-wise distances is recommended. Simulation studies are carried out in order to compare the new clustering techniques with the well-known ones. Moreover, a successful application is presented. Here the task is to discover clusters with quite different number of observations in a high-dimensional space.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Hans-Georg Bartel,et al.  Exploring Roman Brick and Tile by Cluster Analysis with Validation of Results , 2002 .

[3]  B. Leshowitz,et al.  Masking patterns for continuous and gated sinusoids. , 1975, The Journal of the Acoustical Society of America.

[4]  Wolfgang Gaul,et al.  "Classification, Automation, and New Media" , 2002 .

[5]  Hans-Joachim Mucha,et al.  XClust: Clustering in an Interactive Way , 1995 .

[6]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[7]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[8]  F. Hampel Contributions to the theory of robust estimation , 1968 .

[9]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[10]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[11]  Chris Fraley,et al.  Algorithms for Model-Based Gaussian Hierarchical Clustering , 1998, SIAM J. Sci. Comput..

[12]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[13]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[14]  D. Hand Cluster dissection and analysis: Helmuth SPATH Wiley, Chichester, 1985, 226 pages, £25.00 , 1986 .

[15]  Wolfgang Härdle,et al.  XploRe: An Interactive Statistical Computing Environment , 1995 .

[16]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .