Deep Linear Discriminant Analysis

We introduce Deep Linear Discriminant Analysis (DeepLDA) which learns linearly separable latent representations in an end-to-end fashion. Classic LDA extracts features which preserve class separability and is used for dimensionality reduction for many classification problems. The central idea of this paper is to put LDA on top of a deep neural network. This can be seen as a non-linear extension of classic LDA. Instead of maximizing the likelihood of target labels for individual samples, we propose an objective function that pushes the network to produce feature distributions which: (a) have low variance within the same class and (b) high variance between different classes. Our objective is derived from the general LDA eigenvalue problem and still allows to train with stochastic gradient descent and back-propagation. For evaluation we test our approach on three different benchmark datasets (MNIST, CIFAR-10 and STL-10). DeepLDA produces competitive results on MNIST and CIFAR-10 and outperforms a network trained with categorical cross entropy (same architecture) on a supervised setting of STL-10.

[1]  Jiwen Lu,et al.  PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[2]  Konstantinos N. Plataniotis,et al.  Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition , 2005, Pattern Recognit. Lett..

[3]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[4]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[5]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[6]  André Stuhlsatz,et al.  Feature Extraction With Deep Neural Networks by a Generalized Discriminant Analysis , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .

[8]  Benjamin Graham,et al.  Spatially-sparse convolutional neural networks , 2014, ArXiv.

[9]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[10]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[11]  Yann LeCun,et al.  Stacked What-Where Auto-encoders , 2015, ArXiv.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[14]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[15]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Sepp Hochreiter,et al.  Rectified Factor Networks , 2015, NIPS.

[18]  J. Friedman Regularized Discriminant Analysis , 1989 .

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[21]  Colin Raffel,et al.  Lasagne: First release. , 2015 .

[22]  J. Leeuw Derivatives of Generalized Eigen Systems with Applications , 2007 .

[23]  Jeff A. Bilmes,et al.  Unsupervised learning of acoustic features via deep canonical correlation analysis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).