Regularization Learning for Image Recognition

In order to reduce overfitting for the image recognition application, this paper proposes a novel regularization learning algorithm for deep learning. Above all, we propose a novel probabilistic representation for explaining the architecture of Deep Neural Networks (DNNs), which demonstrates that the hidden layers close to the input formulate prior distributions, thus DNNs have an explicit regularization, namely the prior distributions. Furthermore, we show that the backpropagation learning algorithm is the reason for overfitting because it cannot guarantee precisely learning the prior distribution. Based on the proposed theoretical explanation for deep learning, we propose a novel regularization learning algorithm for DNNs. In contrast to most existing regularization methods reducing overfitting by decreasing the training complexity of DNNs, the proposed method reduces overfitting through training the corresponding prior distribution in a more efficient way, thereby deriving a more powerful regularization. Simulations demonstrate the proposed probabilistic representation on a synthetic dataset and validate the proposed regularization on the CIFAR-10 dataset.

[1]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[2]  David J. Schwab,et al.  An exact mapping between the Variational Renormalization Group and Deep Learning , 2014, ArXiv.

[3]  Lei Le,et al.  Supervised autoencoders: Improving generalization performance with unsupervised regularizers , 2018, NeurIPS.

[4]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[5]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[6]  Ron Meir,et al.  Generalization Bounds For Unsupervised and Semi-Supervised Learning With Autoencoders , 2019, ArXiv.

[7]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[9]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[10]  Tommi S. Jaakkola,et al.  On the Dirichlet Prior and Bayesian Regularization , 2002, NIPS.

[11]  Zilei Wang,et al.  Weighted Channel Dropout for Regularization of Deep Convolutional Neural Network , 2019, AAAI.

[12]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[13]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[14]  Michael J. Black,et al.  Fields of Experts: a framework for learning image priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[16]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[17]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[18]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[19]  Xinjie Lan A synthetic dataset for deep learning , 2019, ArXiv.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[22]  Qi Gao,et al.  A generative perspective on MRFs in low-level vision , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Stefan Roth,et al.  Bayesian deblurring with integrated noise estimation , 2011, CVPR 2011.

[25]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[26]  Jaehoon Lee,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.

[27]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Geoffrey E. Hinton,et al.  Deep Mixtures of Factor Analysers , 2012, ICML.

[29]  Kenneth E. Barner,et al.  Field of experts: Optimal structured Bayesian compressed sensing , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[30]  Michael J. Black,et al.  Fields of Experts , 2009, International Journal of Computer Vision.

[31]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[32]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[33]  Kenneth E. Barner,et al.  From MRFS to CNNS: A novel image restoration method , 2018, 2018 52nd Annual Conference on Information Sciences and Systems (CISS).

[34]  Jaehoon Lee,et al.  Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes , 2018, ICLR.

[35]  Eero P. Simoncelli Statistical models for images: compression, restoration and synthesis , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[36]  Martin J. Wainwright,et al.  Scale Mixtures of Gaussians and the Statistics of Natural Images , 1999, NIPS.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Richard G. Baraniuk,et al.  A Probabilistic Framework for Deep Learning , 2016, NIPS.

[39]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[40]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[41]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[42]  Bohyung Han,et al.  Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.

[43]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.