Regularizing Deep Neural Networks by Enhancing Diversity in Feature Extraction

This paper proposes a new and efficient technique to regularize the neural network in the context of deep learning using correlations among features. Previous studies have shown that oversized deep neural network models tend to produce a lot of redundant features that are either the shifted version of one another or are very similar and show little or no variations, thus resulting in redundant filtering. We propose a way to address this problem and show that such redundancy can be avoided using regularization and adaptive feature dropout mechanism. We show that regularizing both negative and positive correlated features according to their differentiation and based on their relative cosine distances yields network extracting dissimilar features with less overfitting and better generalization. This concept is illustrated with deep multilayer perceptron, convolutional neural network, sparse autoencoder, gated recurrent unit, and long short-term memory on MNIST digits recognition, CIFAR-10, ImageNet, and Stanford Natural Language Inference data sets.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Eugenio Culurciello,et al.  Convolutional Clustering for Unsupervised Learning , 2015, ArXiv.

[3]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Chris H. Q. Ding,et al.  Cluster merging and splitting in hierarchical clustering algorithms , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[7]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[8]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Xiaodong Wang,et al.  Reverse engineering gene regulatory networks from measurement with missing values , 2016, EURASIP J. Bioinform. Syst. Biol..

[10]  Zhen-Hua Ling,et al.  Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference , 2017, RepEval@EMNLP.

[11]  Romain Hérault,et al.  Neural Networks Regularization Through Class-wise Invariant Representation Learning , 2017, ArXiv.

[12]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[15]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[16]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[17]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[18]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[19]  Kenneth Rose,et al.  A global optimization technique for statistical classifier design , 1996, IEEE Trans. Signal Process..

[20]  Qi Tian,et al.  DisturbLabel: Regularizing CNN on the Loss Layer , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[22]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[23]  Jacek M. Zurada,et al.  Deep Learning of Constrained Autoencoders for Enhanced Understanding of Data , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[25]  Yoshua Bengio,et al.  Slow, Decorrelated Features for Pretraining Complex Cell-like Networks , 2009, NIPS.

[26]  Li-Rong Dai,et al.  Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Hong Yu,et al.  Neural Semantic Encoders , 2016, EACL.

[28]  F. Xavier Roca,et al.  Regularizing CNNs with Locally Constrained Decorrelations , 2016, ICLR.

[29]  Jacek M. Zurada,et al.  Nonredundant sparse feature extraction using autoencoders with receptive fields clustering , 2017, Neural Networks.

[30]  B. Walter,et al.  Fast agglomerative clustering for rendering , 2008, 2008 IEEE Symposium on Interactive Ray Tracing.

[31]  Thomas W. Marblehead Bushman,et al.  Intelligent and Optimal Normalized Correlation for High- Speed Pattern Matching , 2000 .

[32]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[33]  Yang Liu,et al.  Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention , 2016, ArXiv.

[34]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[35]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[40]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[41]  Fei Liu,et al.  Method for Determining the Optimal Number of Clusters Based on Agglomerative Hierarchical Clustering , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[46]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.