论文信息 - Small-Variance Asymptotics for Dirichlet Process Mixtures of SVMs

Small-Variance Asymptotics for Dirichlet Process Mixtures of SVMs

Infinite SVM (iSVM) is a Dirichlet process (DP) mixture of large-margin classifiers. Though flexible in learning nonlinear classifiers and discovering latent clustering structures, iSVM has a difficult inference task and existing methods could hinder its applicability to large-scale problems. This paper presents a small-variance asymptotic analysis to derive a simple and efficient algorithm, which monotonically optimizes a maxmargin DP-means (M2DPM) problem, an extension of DP-means for both predictive learning and descriptive clustering. Our analysis is built on Gibbs infinite SVMs, an alternative DP mixture of large-margin machines, which admits a partially collapsed Gibbs sampler without truncation by exploring data augmentation techniques. Experimental results show that M2DPM runs much faster than similar algorithms without sacrificing prediction accuracies.

Jun Zhu | Yining Wang | Yining Wang | Jun Zhu

[1] Michael I. Jordan,et al. Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models , 2012, NIPS.

[2] Chris H. Q. Ding,et al. Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[3] Jun Zhou,et al. Mixing Linear SVMs for Nonlinear Classification , 2010, IEEE Transactions on Neural Networks.

[4] Michael I. Jordan,et al. MAD-Bayes: MAP-based Asymptotic Derivations from Bayes , 2012, ICML.

[5] Koby Crammer,et al. Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[6] W. Wong,et al. The calculation of posterior distributions by data augmentation , 1987 .

[7] Warren B. Powell,et al. Dirichlet Process Mixtures of Generalized Linear Models , 2009, J. Mach. Learn. Res..

[8] W. R. Schucany,et al. Generating Random Variates Using Transformations with Multiple Roots , 1976 .

[9] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[10] J. F. C. Kingman,et al. Information and Exponential Families in Statistical Theory , 1980 .

[11] Ning Chen,et al. Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[12] Max A. Little,et al. Suitability of Dysphonia Measurements for Telemonitoring of Parkinson's Disease , 2008, IEEE Transactions on Biomedical Engineering.

[13] Maosong Sun,et al. Monte Carlo Methods for Maximum Margin Supervised Topic Models , 2012, NIPS.

[14] D. V. van Dyk,et al. Partially Collapsed Gibbs Samplers , 2008 .

[15] D. Aldous. Exchangeability and related topics , 1985 .

[16] Babak Shahbaba,et al. Nonlinear Models Using Dirichlet Process Mixtures , 2007, J. Mach. Learn. Res..

[17] Michael I. Jordan,et al. Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[18] L. Devroye. Non-Uniform Random Variate Generation , 1986 .

[19] Ning Chen,et al. Gibbs max-margin topic models with data augmentation , 2013, J. Mach. Learn. Res..

[20] D. Blackwell,et al. Ferguson Distributions Via Polya Urn Schemes , 1973 .

[21] Bo Zhang,et al. Max-Margin Infinite Hidden Markov Models , 2014, ICML.

[22] Yue Gao. Probabilistic Principle Component Analysis on Time Lapse images , 2010 .

[23] Ning Chen,et al. Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines , 2011, ICML.

[24] O. Barndorff-Nielsen. Information and Exponential Families in Statistical Theory , 1980 .

[25] J. Pitman. Exchangeable and partially exchangeable random partitions , 1995 .

[26] H. Akaike. A new look at the statistical model identification , 1974 .

[27] Nicholas G. Polson,et al. Data augmentation for support vector machines , 2011 .

[28] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[29] François Laviolette,et al. PAC-Bayesian learning of linear classifiers , 2009, ICML '09.