Methods for Clustering Categorical and Mixed Data: An Overview and New Algorithms

Methods of clustering for categorical and mixed data are considered. Dissimilarities for this purpose are reviewed and different classes of algorithms according to different classes of similarities are discussed. Details of several algorithms are then given, which include agglomerative hierarchical clustering, K-means and related methods such as K-medoids and K-modes, and methods of network clustering. The way how the combinations of existing ideas leads to new algorithms is discussed.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[4]  Sadaaki Miyamoto,et al.  A family of algorithms using spectral clustering and DBSCAN , 2014, 2014 IEEE International Conference on Granular Computing (GrC).

[5]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[6]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[7]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[8]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[9]  Sadaaki Miyamoto,et al.  A method of two stage clustering using agglomerative hierarchical algorithms with one-pass k-means++ or k-median++ , 2014, 2014 IEEE International Conference on Granular Computing (GrC).

[10]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[11]  Katsuhiro Honda,et al.  Fuzzy Co-Clustering Induced by Multinomial Mixture Models , 2015, J. Adv. Comput. Intell. Intell. Informatics.

[12]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[13]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[14]  Sadaaki Miyamoto,et al.  Fuzzy Sets in Information Retrieval and Cluster Analysis , 1990, Theory and Decision Library.

[15]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[16]  James C. Bezdek,et al.  Fuzzy mathematics in pattern classification , 1973 .

[17]  Robert C. Kohberger,et al.  Cluster Analysis (3rd ed.) , 1994 .

[18]  Sadaaki Miyamoto,et al.  Hard and Fuzzy c-Medoids for Asymmetric Networks , 2015, IFSA-EUSFLAT.

[19]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.