Optimizing the Cauchy-Schwarz PDF Distance for Information Theoretic, Non-parametric Clustering

This paper addresses the problem of efficient information theoretic, non-parametric data clustering. We develop a procedure for adapting the cluster memberships of the data patterns, in order to maximize the recent Cauchy-Schwarz (CS) probability density function (pdf) distance measure. Each pdf corresponds to a cluster. The CS distance is estimated analytically and non-parametrically by means of the Parzen window technique for density estimation. The resulting form of the cost function makes it possible to develop an efficient adaption procedure based on constrained gradient descent, using stochastic approximation of the gradients. The computational complexity of the algorithm is O(MN), M ≪ N, where N is the total number of data patterns and M is the number of data patterns used in the stochastic approximation. We show that the new algorithm is capable of performing well on several odd-shaped and irregular data sets.

[1]  Robert Jenssen,et al.  The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space , 2004, NIPS.

[2]  Stephen J. Roberts,et al.  Maximum certainty data partitioning , 2000, Pattern Recognit..

[3]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[4]  Geoffrey C. Fox,et al.  Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.

[5]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[6]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[8]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[9]  James C. Bezdek,et al.  A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Naftali Tishby,et al.  Data Clustering by Markovian Relaxation and the Information Bottleneck Method , 2000, NIPS.

[12]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[14]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[15]  Robert Jenssen,et al.  Information cut and information forces for clustering , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).