Deep Max-Margin Discriminant Projection

In this paper, a unified Bayesian max-margin discriminant projection framework is proposed, which is able to jointly learn the discriminant feature space and the max-margin classifier with different relationships between the latent representations and observations. We assume that the latent representation follows a normal distribution whose sufficient statistics are functions of the observations. The function can be flexibly realized through either shallow or deep structures. The shallow structure includes linear, nonlinear kernel-based functions, and even the convolutional projection, which can be further trained layerwisely to build a multilayered convolutional feature learning model. To take the advantage of the deep neural networks, especially their highly expressive ability and efficient parameter learning, we integrate Bayesian modeling and the popular neural networks, for example, mltilayer perceptron and convolutional neural network, to build an end-to-end Bayesian deep discriminant projection under the proposed framework, which degenerated into the existing shallow linear or convolutional projection with the single-layer structure. Moreover, efficient scalable inferences for the realizations with different functions are derived to handle large-scale data via a stochastic gradient Markov chain Monte Carlo. Finally, we demonstrate the effectiveness and efficiency of the proposed models by the experiments on real-world data, including four image benchmarks (MNIST, CIFAR-10, STL-10, and SVHN) and one measured radar high-resolution range profile dataset, with the detailed analysis about the parameters and computational complexity.

[1]  Ryan Babbush,et al.  Bayesian Sampling Using Stochastic Gradient Thermostats , 2014, NIPS.

[2]  Zhanxing Zhu,et al.  Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling , 2015, NIPS.

[3]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[4]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[6]  Hal Daumé,et al.  Multi-Label Prediction via Sparse Infinite CCA , 2009, NIPS.

[7]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[8]  Zhe Gan,et al.  Variational Autoencoder for Deep Learning of Images, Labels and Captions , 2016, NIPS.

[9]  Zhongzhi Shi,et al.  Supervised feature extraction algorithm based on improved polynomial entropy , 2006, J. Inf. Sci..

[10]  Hongwei Liu,et al.  Infinite max-margin factor analysis via data augmentation , 2016, Pattern Recognit..

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[13]  Hwann-Tzong Chen,et al.  Local discriminant embedding and its variants , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Gerhard Widmer,et al.  Deep Linear Discriminant Analysis , 2015, ICLR.

[15]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[16]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[17]  Jun Zhu,et al.  Bayesian Max-margin Multi-Task Learning with Data Augmentation , 2014, ICML.

[18]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[19]  Bo Zhang,et al.  Max-Margin Deep Generative Models , 2015, NIPS.

[20]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[21]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[22]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[23]  한보형 Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network , 2016 .

[24]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[25]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[26]  Steven C. H. Hoi,et al.  Large Scale Online Kernel Learning , 2016, J. Mach. Learn. Res..

[27]  Lawrence Carin,et al.  Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.

[28]  Hongwei Liu,et al.  Max-Margin Discriminant Projection via Data Augmentation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[29]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[30]  Hongwei Liu,et al.  Radar HRRP target recognition with deep networks , 2017, Pattern Recognit..

[31]  Dit-Yan Yeung,et al.  Natural-Parameter Networks: A Class of Probabilistic Neural Networks , 2016, NIPS.

[32]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[33]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[34]  David B. Dunson,et al.  Deep Learning with Hierarchical Convolutional Factor Analysis , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[36]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[37]  Zheng Bao,et al.  Noise Robust Radar HRRP Target Recognition Based on Multitask Factor Analysis With Small Training Data Size , 2012, IEEE Transactions on Signal Processing.

[38]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[39]  Mehmet Gönen,et al.  Bayesian Supervised Dimensionality Reduction , 2013, IEEE Transactions on Cybernetics.

[40]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[41]  Ning Chen,et al.  Gibbs max-margin topic models with data augmentation , 2013, J. Mach. Learn. Res..

[42]  Xin Yuan,et al.  A Deep Generative Deconvolutional Image Model , 2015, AISTATS.

[43]  Jun Zhu,et al.  Online Bayesian Passive-Aggressive Learning , 2013, ICML.

[44]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[45]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[46]  Zheng Bao,et al.  Large Margin Feature Weighting Method via Linear Programming , 2009, IEEE Transactions on Knowledge and Data Engineering.

[47]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[48]  Xiaogang Wang,et al.  Dual-space linear discriminant analysis for face recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[49]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[50]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[51]  C. P. Sheppard,et al.  Predicting time series by a fully connected neural network trained by back propagation , 1992 .

[52]  Hans-Peter Kriegel,et al.  Supervised probabilistic principal component analysis , 2006, KDD '06.

[53]  M. Tuckerman Statistical Mechanics: Theory and Molecular Simulation , 2010 .

[54]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[55]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[56]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[57]  Jieping Ye,et al.  Regularized discriminant analysis for high dimensional, low sample size data , 2006, KDD '06.