Regularizing neural networks with adaptive local drop

Neural network (NN) models have shown good performance on many image recognition benchmarks. Given large image datasets, these models typically have millions or billions of parameters that can easily lead to over-fitting without regularization. Dropout and DropConnect show their effectiveness of regularizing large fully connected layers within neural networks. In Dropout, each neural activation within the network is randomly set to zero with a probability during training. In DropConnect, a generalization of Dropout, each connection weight within the network is randomly set to zero with a probability instead. Both of the probabilities in Dropout and DropConnect are universal predefined constants. We propose Adaptive Local Drop (ALDrop), a novel regularization method that sets each connection weight within the network with a learned probability adaptive to the input image dataset using a locality-based measure. Experiments on several image recognition benchmarks show that our model outperforms Dropout and DropConnect.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Bo Zhang,et al.  Restricted Boltzmann Machine with Adaptive Local Hidden Units , 2013, ICONIP.

[3]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[4]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[5]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[6]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[7]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[8]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[9]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[14]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[15]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.