Discriminative Bayesian Nonparametric Clustering

We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models.

[1]  Trung Le,et al.  Distributed data augmented support vector machine on Spark , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[4]  Dinh Q. Phung,et al.  Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts , 2014, ICML.

[5]  Andreas Krause,et al.  Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[6]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[7]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Bo Zhang,et al.  Max-Margin Infinite Hidden Markov Models , 2014, ICML.

[9]  Jieping Ye,et al.  Discriminative K-means for Clustering , 2007, NIPS.

[10]  Xin Yuan,et al.  Bayesian Nonlinear Support Vector Machines and Discriminative Factor Modeling , 2014, NIPS.

[11]  Imad Aad,et al.  The Mobile Data Challenge: Big Data for Mobile Computing Research , 2012 .

[12]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[13]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.

[14]  William W. Cohen,et al.  Proceedings of the 23rd international conference on Machine learning , 2006, ICML 2008.

[15]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[16]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[17]  Xinhua Zhang,et al.  Robust Bayesian Max-Margin Clustering , 2014, NIPS.

[18]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[19]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[20]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[21]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Ning Chen,et al.  Gibbs max-margin topic models with data augmentation , 2013, J. Mach. Learn. Res..

[23]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[24]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[25]  Trung Le,et al.  One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[26]  Jun Zhu,et al.  Bayesian Max-margin Multi-Task Learning with Data Augmentation , 2014, ICML.

[27]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[28]  Samuel Kaski,et al.  Discriminative clustering , 2005, Neurocomputing.

[29]  Trung Le,et al.  Large Sample Asymptotic for Nonparametric Mixture Model with Count Data , 2015 .