Bayesian Maximum Margin Principal Component Analysis

Supervised dimensionality reduction has shown great advantages in finding predictive subspaces. Previous methods rarely consider the popular maximum margin principle and are prone to overfitting to usually small training data, especially for those under the maximum likelihood framework. In this paper, we present a posterior-regularized Bayesian approach to combine Principal Component Analysis (PCA) with the max-margin learning. Based on the data augmentation idea for max-margin learning and the probabilistic interpretation of PCA, our method can automatically infer the weight and penalty parameter of max-margin learning machine, while finding the most appropriate PCA sub-space simultaneously under the Bayesian framework. We develop a fast mean-field variational inference algorithm to approximate the posterior. Experimental results on various classification tasks show that our method outperforms a number of competitors.

[1]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[2]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[3]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[4]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models , 2012, J. Mach. Learn. Res..

[5]  Ning Chen,et al.  Gibbs max-margin topic models with data augmentation , 2013, J. Mach. Learn. Res..

[6]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[7]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[8]  Daoqiang Zhang,et al.  Semi-Supervised Dimensionality Reduction ∗ , 2007 .

[9]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[10]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[11]  Charles M. Bishop Variational principal components , 1999 .

[12]  Fuchun Sun,et al.  Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[14]  Yuhong Guo,et al.  Supervised Exponential Family Principal Component Analysis via Convex Optimization , 2008, NIPS.

[15]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[16]  Hans-Peter Kriegel,et al.  Supervised probabilistic principal component analysis , 2006, KDD '06.

[17]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[18]  Foster J. Provost,et al.  Scalable supervised dimensionality reduction using clustering , 2013, KDD.

[19]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[20]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[21]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[22]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[23]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[24]  Ning Chen,et al.  Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[25]  Geoffrey J. Gordon,et al.  The support vector decomposition machine , 2006, ICML.

[26]  Geoffrey J. Gordon,et al.  Closed-form supervised dimensionality reduction with generalized linear models , 2008, ICML '08.

[27]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[28]  Dacheng Tao,et al.  Large-margin Weakly Supervised Dimensionality Reduction , 2014, ICML.

[29]  Xuelong Li,et al.  Semisupervised Dimensionality Reduction and Classification Through Virtual Label Regression , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .