A New Weight Based Density Peaks Clustering Algorithm for Numerical and Categorical Data

Discovering the potential group structure of objects is of crucial importance to data mining. Most of the existing clustering approaches are applicable only to purely numerical or categorical data, and only a few approaches can deal with both numerical and categorical attributes recently, however, these approaches often need higher computational cost. To cluster data with both numerical and categorical attributes efficiently, in this paper, we propose a new approach with the following schemes. First, a measure of the importance of each categorical attribute is designed and a method to generate the weight of each categorical attribute is proposed based on this measure. Then a unified distance metric is proposed by combining the distance for the numerical part and that for the categorical part with weights. Furthermore, combining the new weights into method in [1], an improved density peaks clustering algorithm is presented. Finally, the experimental results show the efficiency of the proposed approach.

[1]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[2]  Hong Yan,et al.  Pattern recognition techniques for the emerging field of bioinformatics: A review , 2005, Pattern Recognit..

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Hong Jia,et al.  Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Maria A. Zuluaga,et al.  Detecting Clinically Meaningful Shape Clusters in Medical Image Data: Metrics Analysis for Hierarchical Clustering Applied to Healthy and Pathological Aortic Arches , 2017, IEEE Transactions on Biomedical Engineering.

[6]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[7]  Cheng-Chung Chen,et al.  A terms mining and clustering technique for surveying network and content analysis of academic groups exploration , 2017, Cluster Computing.

[8]  Dongqi Fu,et al.  New combination algorithms in commercial area data mining and clustering , 2016, 2016 IEEE International Conference on Big Data Analysis (ICBDA).

[9]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[10]  Hong Jia,et al.  Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number , 2013, Pattern Recognit..

[11]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..