R-Map: Mapping Categorical Data for Clustering and Visualization Based on Reference Sets

In this paper, we propose a framework that maps categorical data into a numerical data space via a reference set, aiming to make the existing numerical clustering algorithms directly applicable on the generated image data set as well as to visualize the data. Using statistics theories, we analyze our framework and give the conditions under which the data mapping is efficient and yet preserves a flexible property of the original data, i.e. the data points within the same cluster are more similar. The algorithm is simple and has good effectiveness under some conditions. The experimental evaluation on numerous categorical data sets shows that it not only outperforms the related data mapping approaches but also beats some categorical clustering algorithms in terms of effectiveness and efficiency.