Spectral Chinese Restaurant Processes: Nonparametric Clustering Based on Similarities

We introduce a new nonparametric clustering model which combines the recently proposed distance-dependent Chinese restaurant process (dd-CRP) and non-linear, spectral methods for dimensionality reduction. Our model retains the ability of nonparametric methods to learn the number of clusters from data. At the same time it addresses two key limitations of nonparametric Bayesian methods: modeling data that are not exchangeable and have many correlated features. Spectral methods use the similarity between documents to map them into a low-dimensional spectral space where we then compare several clustering methods. Our experiments on handwritten digits and text documents show that nonparametric methods such as the CRP or dd-CRP can perform as well as or better than k-means and also recover the true number of clusters. We improve the performance of the dd-CRP in spectral space by incorporating the original similarity matrix in its prior. This simple modification results in better performance than all other methods we compared to. We offer a new formulation and first experimental evaluation of a general Gibbs sampler for mixture modeling with distance-dependent CRPs.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[3]  Antonio Torralba,et al.  Describing Visual Scenes Using Transformed Objects and Parts , 2008, International Journal of Computer Vision.

[4]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[5]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[6]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[7]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[8]  Michael I. Jordan,et al.  Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes , 2008, NIPS.

[9]  Peter I. Frazier,et al.  Distance dependent Chinese restaurant processes , 2009, ICML.

[10]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[11]  Nicolas Le Roux,et al.  Spectral Dimensionality Reduction , 2006, Feature Extraction.

[12]  Erik B. Sudderth Graphical models for visual object recognition and tracking , 2006 .

[13]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[14]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[15]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[16]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[17]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[18]  Michael J. Black,et al.  A nonparametric Bayesian alternative to spike sorting , 2008, Journal of Neuroscience Methods.

[19]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[20]  David B. Dunson,et al.  A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation , 2009, NIPS.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Andrew McCallum,et al.  Gibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors , 2008 .

[23]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[24]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[25]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..