Nearest Neighbor Median Shift Clustering for Binary Data

We describe in this paper the theory and practice behind a new modal clustering method for binary data. Our approach (BinNNMS) is based on the nearest neighbor median shift. The median shift is an extension of the well-known mean shift, which was designed for continuous data, to handle binary data. We demonstrate that BinNNMS can discover accurately the location of clusters in binary data with theoretical and experimental analyses.

[1]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[2]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[3]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[4]  Friedrich Leisch,et al.  Competitive Learning for Binary Valued Data , 1998 .

[5]  Larry D. Hostetler,et al.  Optimization of k nearest neighbor density estimates , 1973, IEEE Trans. Inf. Theory.

[6]  Tao Li,et al.  A Unified View on Clustering Binary Data , 2006, Machine Learning.

[7]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[8]  M. Cugmas,et al.  On comparing partitions , 2015 .

[9]  J. Aitchison,et al.  Multivariate binary discrimination by the kernel method , 1976 .

[10]  L. Hubert,et al.  Comparing partitions , 1985 .

[11]  Mustapha Lebbah,et al.  Nearest neighbour estimators of density derivatives, with application to mean shift clustering , 2016, Pattern Recognit. Lett..

[12]  Mustapha Lebbah,et al.  Topological map for binary data , 2000, ESANN.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  A. Mechelli,et al.  Clustering analysis , 2020, Machine Learning.

[15]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .