Why so many clustering algorithms: a position paper

We argue that there are many clustering algorithms, because the notion of "cluster" cannot be precisely defined. Clustering is in the eye of the beholder, and as such, researchers have proposed many induction principles and models whose corresponding optimization problem can only be approximately solved by an even larger number of algorithms. Therefore, comparing clustering algorithms, must take into account a careful understanding of the inductive principles involved.

[1]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[2]  M. Aldenderfer Cluster Analysis , 1984 .

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Michalis Vazirgiannis,et al.  Quality Scheme Assessment in the Clustering Process , 2000, PKDD.

[5]  Vladimir Estivill-Castro,et al.  Discovering Associations in Spatial Data - An Efficient Medoid Based Approach , 1998, PAKDD.

[6]  Raymond E. Bonner,et al.  On Some Clustering Techniques , 1964, IBM J. Res. Dev..

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  Jianhua Yang,et al.  Non-crisp Clustering by Fast, Convergent, and Robust Algorithms , 2001, PKDD.

[9]  Vladimir Estivill-Castro,et al.  Hybrid Genetic Algorithms Are Better for Spatial Clustering , 2000, PRICAI.

[10]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[11]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[12]  Ramasamy Uthurusamy,et al.  Data mining and knowledge discovery in databases , 1996, CACM.

[13]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  R. Wilcox Introduction to Robust Estimation and Hypothesis Testing , 1997 .

[16]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[17]  Michael E. Houle,et al.  Robust Distance-Based Clustering with Applications to Spatial Data Mining , 2001, Algorithmica.

[18]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[19]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[20]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[21]  Harold W. Kulin,et al.  AN EFFICIENT ALGORITHM FOR THE NUMERICAL SOLUTION OF THE GENERALIZED WEBER PROBLEM IN SPATIAL ECONOMICS , 1962 .

[22]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[23]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[24]  Michael P. Windham,et al.  Cluster Validity for the Fuzzy c-Means Clustering Algorithrm , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[26]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[27]  Boudewijn P. F. Lelieveldt,et al.  A new cluster validity index for the fuzzy c-mean , 1998, Pattern Recognit. Lett..

[28]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[29]  Michael E. Houle,et al.  Data Structures for Minimization of Total Within-Group Distance for Spatio-temporal Clustering , 2001, PKDD.

[30]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[31]  H. Kuhn An Efficient Algorithm for the Numerical Solution of the Generalized Weber Problem in Spatial Economics , 1992 .

[32]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[33]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[34]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[35]  Harold W. Kuhn,et al.  A note on Fermat's problem , 1973, Math. Program..

[36]  Chandrajit L. Bajaj,et al.  Proving Geometric Algorithm Non-Solvability: An Application of Factoring Polynomials , 1986, J. Symb. Comput..

[37]  P. Brucker On the Complexity of Clustering Problems , 1978 .

[38]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[39]  Roberto J. Bayardo,et al.  Athena: Mining-Based Interactive Management of Text Database , 2000, EDBT.

[40]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[41]  Polly Bart,et al.  Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph , 1968, Oper. Res..

[42]  Brian Everitt,et al.  Cluster analysis , 1974 .

[43]  Byron Dom,et al.  An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.

[44]  Andrew L. Rukhin,et al.  Tools for statistical inference , 1991 .

[45]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[46]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[47]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[48]  Charu C. Aggarwal,et al.  A human-computer cooperative system for effective high dimensional clustering , 2001, KDD '01.

[49]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.