Fuzzy c-means clustering of mixed databases including numerical and nominal variables

Fuzzy c-means (FCM) clustering is an unsupervised classification method for revealing intrinsic structure of multi-variate data sets. It is, however, applicable to databases including only numerical variables. For analyzing the intrinsic feature of categorical data sets, many approaches to the quantification of nominal variables have been proposed. Most of them are performed with the goal being to construct combined category quantifications and object scores plots. In this paper, we propose a new approach to the clustering of mixed databases including not only numerical variables but also categorical variables. The clustering technique uses an FCM-type simple iterative algorithm that includes a quantification step. In the quantification step, the category scores are derived so that they suit FCM clustering considering cluster centers and memberships. Numerical experiments demonstrate the characteristic features of the proposed method.