A Convex Exemplar-based Approach to MAD-Bayes Dirichlet Process Mixture Models

MAD-Bayes (MAP-based Asymptotic Derivations) has been recently proposed as a general technique to derive scalable algorithm for Bayesian Nonparametric models. However, the combinatorial nature of objective functions derived from MAD-Bayes results in hard optimization problem, for which current practice employs heuristic algorithms analogous to k-means to find local minimum. In this paper, we consider the exemplar-based version of MAD-Bayes formulation for DP and Hierarchical DP (HDP) mixture model. We show that an exemplarbased MAD-Bayes formulation can be relaxed to a convex structural-regularized program that, under cluster-separation conditions, shares the same optimal solution to its combinatorial counterpart. An algorithm based on Alternating Direction Method of Multiplier (ADMM) is then proposed to solve such program. In our experiments on several benchmark data sets, the proposed method finds optimal solution of the combinatorial problem and significantly improves existing methods in terms of the exemplar-based objective.

[1]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[2]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[3]  Ke Jiang,et al.  Small-Variance Asymptotics for Hidden Markov Models , 2013, NIPS.

[4]  Han Liu,et al.  Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[5]  M. J. van der Laan,et al.  A new partitioning around medoids algorithm , 2003 .

[6]  Guillermo Sapiro,et al.  Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery , 2012, NIPS.

[7]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[8]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[9]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[10]  Nimrod Megiddo,et al.  On the Complexity of Some Common Geometric Location Problems , 1984, SIAM J. Comput..

[11]  Michael I. Jordan,et al.  Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models , 2012, NIPS.

[12]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[13]  Michael I. Jordan,et al.  MAD-Bayes: MAP-based Asymptotic Derivations from Bayes , 2012, ICML.

[14]  Ravishankar Krishnaswamy,et al.  Relax, No Need to Round: Integrality of Clustering Formulations , 2014, ITCS.

[15]  Rachel Ward,et al.  Recovery guarantees for exemplar-based clustering , 2013, Inf. Comput..

[16]  Jean-Philippe Vert,et al.  Group Lasso with Overlaps: the Latent Group Lasso approach , 2011, ArXiv.

[17]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.

[18]  Jonathan P. How,et al.  Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture , 2013, NIPS.

[19]  Christos H. Papadimitriou,et al.  Worst-Case and Probabilistic Analysis of a Geometric Location Problem , 1981, SIAM J. Comput..