Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines

We present Infinite SVM (iSVM), a Dirichlet process mixture of large-margin kernel machines for multi-way classification. An iSVM enjoys the advantages of both Bayesian non-parametrics in handling the unknown number of mixing components, and large-margin kernel machines in robustly capturing local nonlinearity of complex data. We develop an efficient variational learning algorithm for posterior inference of iSVM, and we demonstrate the advantages of iSVM over Dirichlet process mixture of generalized linear models and other benchmarks on both synthetic and real Flickr image classification datasets.

[1]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[2]  Katharina Morik,et al.  Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[3]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[4]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[5]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[6]  Jun Zhou,et al.  Mixing Linear SVMs for Nonlinear Classification , 2010, IEEE Transactions on Neural Networks.

[7]  Fuchun Sun,et al.  Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[9]  Y. Shao,et al.  Asymptotics for likelihood ratio tests under loss of identifiability , 2003 .

[10]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[11]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[12]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[13]  Babak Shahbaba,et al.  Nonlinear Models Using Dirichlet Process Mixtures , 2007, J. Mach. Learn. Res..

[14]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[15]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[16]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[17]  Warren B. Powell,et al.  Dirichlet Process Mixtures of Generalized Linear Models , 2009, J. Mach. Learn. Res..

[18]  David B. Dunson,et al.  Nonparametric Bayes regression and classification through mixtures of product kernels , 2010 .

[19]  Joachim M. Buhmann,et al.  Infinite mixture-of-experts model for sparse survival regression with application to breast cancer , 2010, BMC Bioinformatics.

[20]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[21]  Ning Chen,et al.  Predictive Subspace Learning for Multi-view Data: a Large Margin Approach , 2010, NIPS.

[22]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[23]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[24]  William Stafford Noble,et al.  Nonstationary kernel combination , 2006, ICML.

[25]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[26]  Jun Zhu,et al.  Maximum Entropy Discrimination Markov Networks , 2009, J. Mach. Learn. Res..

[27]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[28]  Bo Zhang,et al.  Partially Observed Maximum Entropy Discrimination Markov Networks , 2008, NIPS.

[29]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.