CDP Mixture Models for Data Clustering

In Dirichlet process (DP) mixture models, the number of components is implicitly determined by the sampling parameters of Dirichlet process. However, this kind of models usually produces lots of small mixture components when modeling real-world data, especially high-dimensional data. In this paper, we propose a new class of Dirichlet process mixture models with some constrained principles, named constrained Dirichlet process (CDP) mixture models. Based on general DP mixture models, we add a resampling step to obtain latent parameters. In this way, CDP mixture models can suppress noise and generate the compact patterns of the data. Experimental results on data clustering show the remarkable performance of the CDP mixture models.