Understanding Representations and Reducing their Redundancy in Deep Networks

Neural networks in their modern deep learning incarnation have achieved state of the art performance on a wide variety of tasks and domains. A core intuition behind these methods is that they learn layers of features which interpolate between two domains in a series of related parts. The first part of this thesis introduces the building blocks of neural networks for computer vision. It starts with linear models then proceeds to deep multilayer perceptrons and convolutional neural networks, presenting the core details of each. However, the introduction also focuses on intuition by visualizing concrete examples of the parts of a modern network. The second part of this thesis investigates regularization of neural networks. Methods like dropout and others have been proposed to favor certain (empirically better) solutions over others. However, big deep neural networks still overfit very easily. This section proposes a new regularizer called DeCov, which leads to significantly reduced overfitting (difference between train and val performance) and greater generalization, sometimes better than dropout and other times not. The regularizer is based on the cross-covariance of hidden representations and takes advantage of the intuition that different features should try to represent different things, an intuition others have explored with similar losses. Experiments across a range of datasets and network architectures demonstrate reduced overfitting due to DeCov while almost always maintaining or increasing generalization performance and often improving performance over dropout.

[1]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[2]  D Marr,et al.  Theory of edge detection , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[3]  Yoshua Bengio,et al.  Slow, Decorrelated Features for Pretraining Complex Cell-like Networks , 2009, NIPS.

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[6]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[7]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[10]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[11]  Barak A. Pearlmutter,et al.  G-maximization: An unsupervised learning procedure for discovering regularities , 1987 .

[12]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[14]  Xinlei Chen,et al.  Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Bruno A. Olshausen,et al.  Discovering Hidden Factors of Variation in Deep Networks , 2014, ICLR.

[18]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[19]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[20]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[21]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[22]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[23]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[26]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[27]  Ning Zhang,et al.  Beyond frontal faces: Improving Person Recognition using multiple cues , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[29]  Hugo Larochelle,et al.  Correlational Neural Networks , 2015, Neural Computation.

[30]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[32]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[33]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[34]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[37]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[38]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[39]  A. Tikhonov On the stability of inverse problems , 1943 .

[40]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[42]  Yann LeCun,et al.  Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.

[43]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[44]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[45]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[46]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .