Generalized BackPropagation, Étude De Cas: Orthogonality

This paper introduces an extension of the backpropagation algorithm that enables us to have layers with constrained weights in a deep network. In particular, we make use of the Riemannian geometry and optimization techniques on matrix manifolds to step outside of normal practice in training deep networks, equipping the network with structures such as orthogonality or positive definiteness. Based on our development, we make another contribution by introducing the Stiefel layer, a layer with orthogonal weights. Among various applications, Stiefel layers can be used to design orthogonal filter banks, perform dimensionality reduction and feature extraction. We demonstrate the benefits of having orthogonality in deep networks through a broad set of experiments, ranging from unsupervised feature learning to fine-grained image classification.

[1]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[2]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[3]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[4]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[5]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[6]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[7]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Rémi Gribonval,et al.  Learning unions of orthonormal bases with thresholded singular value decomposition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[10]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[12]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[13]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jiwen Lu,et al.  PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[17]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[18]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[19]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[20]  Pierre-Antoine Absil,et al.  Joint Diagonalization on the Oblique Manifold for Independent Component Analysis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[23]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[24]  Alan L. Yuille,et al.  Learning Deep Structured Models , 2014, ICML.

[25]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[26]  Simone G. O. Fiori,et al.  Image compression using principal component neural networks , 2001, Image Vis. Comput..

[27]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Zoubin Ghahramani,et al.  Unifying linear dimensionality reduction , 2014, 1406.0873.

[29]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[30]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[31]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[32]  Luc Van Gool,et al.  A Riemannian Network for SPD Matrix Learning , 2016, AAAI.

[33]  Gene H. Golub,et al.  Matrix computations , 1983 .

[34]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[35]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[36]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[38]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[39]  Michael I. Jordan,et al.  Multiple Non-Redundant Spectral Clustering Views , 2010, ICML.

[40]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[41]  Basura Fernando,et al.  Learning End-to-end Video Classification with Rank-Pooling , 2016, ICML.

[42]  Bamdev Mishra,et al.  Fixed-rank matrix factorizations and Riemannian low-rank optimization , 2012, Comput. Stat..

[43]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[44]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[45]  Cristian Sminchisescu,et al.  Matrix Backpropagation for Deep Networks with Structured Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[47]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Jiawei Han,et al.  Orthogonal Laplacianfaces for Face Recognition , 2006, IEEE Transactions on Image Processing.

[49]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[50]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[51]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.