Mutual Information Kullback-Leibler Divergence based for Clustering Categorical Data

Clustering is a process of grouping a set of objects into multiple clusters, so that the collection of similar objects will be grouped into the same cluster and dissimilar objects will be grouped into other clusters. Fuzzy k-means algorithm is one of clustering algorithm by partitioning data into k clusters employing Euclidean distance as a distance function. This research discusses clustering categorical data using Fuzzy k-Means Kullback-Leibler Divergence. In the determination of the distance between data and center of cluster uses mutual information known as Kullback-Leibler Divergence distance between the joint distribution and the product distribution from two marginal distributions. Extensive theoretical analysis was performed to show the effectiveness of the proposed method. Moreover, the comparison results of the proposed method with Fuzzy Centroid and Fuzzy k-Partition approaches in terms of response time and clustering accuracy were also performed employing several datasets from UCI Machine Learning. The experiment results show that the proposed algorithm provides good results both from clustering quality and accuracy for clustering categorical data as compared to Fuzzy Centroid and Fuzzy k-Partition.