Small-Variance Asymptotics for Dirichlet Process Mixtures of SVMs

Infinite SVM (iSVM) is a Dirichlet process (DP) mixture of large-margin classifiers. Though flexible in learning nonlinear classifiers and discovering latent clustering structures, iSVM has a difficult inference task and existing methods could hinder its applicability to large-scale problems. This paper presents a small-variance asymptotic analysis to derive a simple and efficient algorithm, which monotonically optimizes a maxmargin DP-means (M2DPM) problem, an extension of DP-means for both predictive learning and descriptive clustering. Our analysis is built on Gibbs infinite SVMs, an alternative DP mixture of large-margin machines, which admits a partially collapsed Gibbs sampler without truncation by exploring data augmentation techniques. Experimental results show that M2DPM runs much faster than similar algorithms without sacrificing prediction accuracies.

[1]  Michael I. Jordan,et al.  Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models , 2012, NIPS.

[2]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[3]  Jun Zhou,et al.  Mixing Linear SVMs for Nonlinear Classification , 2010, IEEE Transactions on Neural Networks.

[4]  Michael I. Jordan,et al.  MAD-Bayes: MAP-based Asymptotic Derivations from Bayes , 2012, ICML.

[5]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[6]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[7]  Warren B. Powell,et al.  Dirichlet Process Mixtures of Generalized Linear Models , 2009, J. Mach. Learn. Res..

[8]  W. R. Schucany,et al.  Generating Random Variates Using Transformations with Multiple Roots , 1976 .

[9]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[10]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[11]  Ning Chen,et al.  Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[12]  Max A. Little,et al.  Suitability of Dysphonia Measurements for Telemonitoring of Parkinson's Disease , 2008, IEEE Transactions on Biomedical Engineering.

[13]  Maosong Sun,et al.  Monte Carlo Methods for Maximum Margin Supervised Topic Models , 2012, NIPS.

[14]  D. V. van Dyk,et al.  Partially Collapsed Gibbs Samplers , 2008 .

[15]  D. Aldous Exchangeability and related topics , 1985 .

[16]  Babak Shahbaba,et al.  Nonlinear Models Using Dirichlet Process Mixtures , 2007, J. Mach. Learn. Res..

[17]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[18]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[19]  Ning Chen,et al.  Gibbs max-margin topic models with data augmentation , 2013, J. Mach. Learn. Res..

[20]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[21]  Bo Zhang,et al.  Max-Margin Infinite Hidden Markov Models , 2014, ICML.

[22]  Yue Gao Probabilistic Principle Component Analysis on Time Lapse images , 2010 .

[23]  Ning Chen,et al.  Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines , 2011, ICML.

[24]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .

[25]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[26]  H. Akaike A new look at the statistical model identification , 1974 .

[27]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[28]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[29]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.