Non-parametric Mixture Models for Clustering

Mixture models have been widely used for data clustering. However, commonly used mixture models are generally of a parametric form (e.g., mixture of Gaussian distributions or GMM), which significantly limits their capacity in fitting diverse multidimensional data distributions encountered in practice.We propose a non-parametric mixture model (NMM) for data clustering in order to detect clusters generated from arbitrary unknown distributions, using non-parametric kernel density estimates. The proposed model is non-parametric since the generative distribution of each data point depends only on the rest of the data points and the chosen kernel. A leave-one-out likelihood maximization is performed to estimate the parameters of the model. The NMM approach, when applied to cluster high dimensional text datasets significantly outperforms the state-of-the-art and classical approaches such as K-means, Gaussian Mixture Models, spectral clustering and linkage methods.

[1]  G. Pflug Kernel Smoothing. Monographs on Statistics and Applied Probability - M. P. Wand; M. C. Jones. , 1996 .

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Meirav Galun,et al.  Fundamental Limitations of Spectral Clustering , 2006, NIPS.

[4]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[5]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Pranab Kumar Sen,et al.  Statistics and Decisions , 2006 .

[8]  David G. Stork,et al.  Pattern Classification , 1973 .

[9]  John Shawe-Taylor,et al.  A Framework for Probability Density Estimation , 2007, AISTATS.

[10]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[11]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  M. Opper,et al.  Advanced mean field methods: theory and practice , 2001 .

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  John Langford,et al.  An objective evaluation criterion for clustering , 2004, KDD.

[15]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[16]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[17]  Pietro Perona,et al.  Non-Parametric Probabilistic Image Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[19]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[21]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[22]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .