Discriminative Clustering by Regularized Information Maximization

Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information-theoretic objective function which balances class separation, class balance and classifier complexity. The approach can flexibly incorporate different likelihood functions, express prior assumptions about the relative size of different classes and incorporate partial labels for semi-supervised learning. In particular, we instantiate the framework to un-supervised, multi-class kernelized logistic regression. Our empirical evaluation indicates that RIM outperforms existing methods on several real data sets, and demonstrates that RIM is an effective model selection method.

[1]  L. Hubert,et al.  Comparing partitions , 1985 .

[2]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[3]  David J. C. MacKay,et al.  Unsupervised Classifiers, Mutual Information and 'Phantom Targets' , 1991, NIPS.

[4]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[5]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[6]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[7]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[8]  Adrian Corduneanu,et al.  On Information Regularization , 2002, UAI.

[9]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[10]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[11]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[12]  W. Bialek,et al.  Information-based clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Yixin Chen,et al.  CLUE: cluster-based retrieval of images by unsupervised learning , 2005, IEEE Transactions on Image Processing.

[14]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[15]  Dale Schuurmans,et al.  Unsupervised and Semi-Supervised Multi-Class Support Vector Machines , 2005, AAAI.

[16]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[17]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[18]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  G. Karypis,et al.  Acyclic Subgraph Based Descriptor Spaces for Chemical Compound Retrieval and Classification , 2006 .

[20]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[21]  Zaïd Harchaoui,et al.  DIFFRAC: a discriminative and flexible framework for clustering , 2007, NIPS.

[22]  Le Song,et al.  A dependence maximization view of clustering , 2007, ICML '07.

[23]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[24]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2009, IEEE Trans. Neural Networks.