Nonlinear Maximum Margin Multi-View Learning with Adaptive Kernel

Existing multi-view learning methods based on kernel function either require the user to select and tune a single predefined kernel or have to compute and store many Gram matrices to perform multiple kernel learning. Apart from the huge consumption of manpower, computation and memory resources, most of these models seek point estimation of their parameters, and are prone to overfitting to small training data. This paper presents an adaptive kernel nonlinear max-margin multi-view learning model under the Bayesian framework. Specifically, we regularize the posterior of an efficient multiview latent variable model by explicitly mapping the latent representations extracted from multiple data views to a random Fourier feature space where max-margin classification constraints are imposed. Assuming these random features are drawn from Dirichlet process Gaussian mixtures, we can adaptively learn shift-invariant kernels from data according to Bochners theorem. For inference, we employ the data augmentation idea for hinge loss, and design an efficient gradient-based MCMC sampler in the augmented space. Having no need to compute the Gram matrix, our algorithm scales linearly with the size of training set. Extensive experiments on real-world datasets demonstrate that our method has superior performance.

[1]  Dacheng Tao,et al.  Large-margin multi-view Gaussian process , 2014, Multimedia Systems.

[2]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models , 2012, J. Mach. Learn. Res..

[3]  Zoubin Ghahramani,et al.  Distributed Inference for Dirichlet Process Mixture Models , 2015, ICML.

[4]  Tim Morris BSc Multimedia Systems , 2000, Applied Computing.

[5]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[6]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[7]  Mehmet Gönen,et al.  Bayesian Efficient Multiple Kernel Learning , 2012, ICML.

[8]  Stephen G. Walker,et al.  Slice sampling mixture models , 2011, Stat. Comput..

[9]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[10]  Vittorio Murino,et al.  A unifying framework for vector-valued manifold regularization and multi-view learning , 2013, ICML.

[11]  W. Rudin,et al.  Fourier Analysis on Groups. , 1965 .

[12]  Fuchun Sun,et al.  Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Xin Yin,et al.  Online Bayesian Max-Margin Subspace Multi-View Learning , 2016, IJCAI.

[14]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[15]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[16]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[17]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[18]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[19]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[20]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[21]  Barnabás Póczos,et al.  Bayesian Nonparametric Kernel-Learning , 2015, AISTATS.

[22]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[23]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[24]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[25]  Zhongfei Zhang,et al.  Simultaneously Combining Multi-view Multi-label Learning with Maximum Margin Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[26]  Fuzhen Zhuang,et al.  Multi-view learning via probabilistic latent semantic analysis , 2012, Inf. Sci..

[27]  H. Prosper Bayesian Analysis , 2000, hep-ph/0006356.

[28]  John Eccleston,et al.  Statistics and Computing , 2006 .

[29]  Chong Wang,et al.  Variational Bayesian Approach to Canonical Correlation Analysis , 2007, IEEE Transactions on Neural Networks.

[30]  Shiliang Sun,et al.  Multi-View Maximum Entropy Discrimination , 2013, IJCAI.

[31]  John Shawe-Taylor,et al.  Synthesis of maximum margin and multiview learning using unlabeled data , 2007, ESANN.

[32]  Stephen G. Walker,et al.  Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..