DIVCLUS-T: A monothetic divisive hierarchical clustering method

DIVCLUS-T is a divisive hierarchical clustering algorithm based on a monothetic bipartitional approach allowing the dendrogram of the hierarchy to be read as a decision tree. It is designed for either numerical or categorical data. Like the Ward agglomerative hierarchical clustering algorithm and the k-means partitioning algorithm, it is based on the minimization of the inertia criterion. However, unlike Ward and k-means, it provides a simple and natural interpretation of the clusters. The price paid by construction in terms of inertia by DIVCLUS-T for this additional interpretation is studied by applying the three algorithms on six databases from the UCI Machine Learning repository.

[1]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[2]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[3]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[4]  Marie Chavent,et al.  Disclosure and determinants studies: An extension using the Divisive Clustering Method (DIV) , 2005 .

[5]  W. T. Williams,et al.  Multivariate Methods in Plant Ecology: I. Association-Analysis in Plant Communities , 1959 .

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  W. T. Williams,et al.  Dissimilarity Analysis: a new Technique of Hierarchical Sub-division , 1964, Nature.

[8]  Marie Chavent,et al.  A monothetic clustering method , 1998, Pattern Recognit. Lett..

[9]  B. Jaumard,et al.  Cluster Analysis and Mathematical Programming , 2003 .

[10]  Naveen Prakash,et al.  Data Definition Facilities in Admin , 1983, Comput. J..

[11]  David J. Hand,et al.  A Handbook of Small Data Sets , 1993 .

[12]  Y. Wang,et al.  The weighted sum of split and diameter clustering , 1996 .

[13]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .

[14]  Gilbert Saporta,et al.  Probabilités, Analyse des données et statistique , 1991 .

[15]  G. N. Lance,et al.  Note on a New Information-Statistic Classificatory Program , 1968, Comput. J..

[16]  Christiane Guinot,et al.  Méthodes divisives de classification et segmentation non supervisée : recherche d'une typologie de la peau humaine saine , 1999 .

[17]  L. Mcquitty Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data , 1966 .

[18]  B. Jaumard,et al.  Efficient algorithms for divisive hierarchical clustering with the diameter criterion , 1990 .

[19]  J.-P. Benzécri,et al.  Rappel : Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques , 1997 .

[20]  R C Durfee,et al.  A METHOD OF CLUSTER ANALYSIS. , 1970, Multivariate behavioral research.