Regularized discriminative clustering

A generative distributional clustering model for continuous data is reviewed and methods for optimizing and regularizing it are introduced and compared. Based on pairs of auxiliary and primary data, the primary data space is partitioned into Voronoi regions that are maximally homogeneous in terms of auxiliary data. Then only variation in the primary data associated with variation in the auxiliary data influences the clusters. Because the whole primary space is partitioned, new samples can be easily clustered in terms of primary data alone. In experiments, the approach is shown to produce more homogeneous clusters than alternative methods. Two regularization methods are demonstrated to further improve the results: an entropy-type penalty for unequal cluster sizes, and the inclusion of a K-means component to the model. The latter can alternatively be interpreted as special kind of joint distribution modeling where the emphasis between discrimination and unsupervised modeling of primary data can be tuned.

[1]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[2]  Tommi S. Jaakkola,et al.  Kernel Expansions with Unlabeled Examples , 2000, NIPS.

[3]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[4]  Trevor Hastie,et al.  Flexible discriminant and mixture models , 2000 .

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Samuel Kaski,et al.  Discriminative Clustering: Optimal Contingency Tables by Learning Metrics , 2002, ECML.

[7]  Samuel Kaski,et al.  Clustering Based on Conditional Distributions in an Auxiliary Space , 2002, Neural Computation.

[8]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[9]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[10]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[11]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[12]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[13]  Samuel Kaski,et al.  Principle of Learning Metrics for Data Analysis , 2004 .

[14]  Suzanna Becker,et al.  Mutual information maximization: models of cortical self-organization. , 1996, Network.

[15]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[16]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .