Batch-normalized Mlpconv-wise supervised pre-training network in network

Deep multi-layered neural networks have nonlinear levels that allow them to represent highly varying nonlinear functions compactly. In this paper, we propose a new deep architecture with enhanced model discrimination ability that we refer to as mlpconv-wise supervised pre-training network in network (MPNIN). The process of information abstraction is facilitated within the receptive fields for MPNIN. The proposed architecture uses the framework of the recently developed NIN structure, which slides a universal approximator, such as a multilayer perceptron with rectifier units, across an image to extract features. However, the random initialization of NIN can produce poor solutions to gradient-based optimization. We use mlpconv-wise supervised pre-training to remedy this defect because this pre-training technique may contribute to overcoming the difficulties of training deep networks by better initializing the weights in all the layers. Moreover, batch normalization is applied to reduce internal covariate shift by pre-conditioning the model. Empirical investigations are conducted on the Mixed National Institute of Standards and Technology (MNIST), the Canadian Institute for Advanced Research (CIFAR-10), CIFAR-100, the Street View House Numbers (SVHN), the US Postal (USPS), Columbia University Image Library (COIL20), COIL100 and Olivetti Research Ltd (ORL) datasets, and the results verify the effectiveness of the proposed MPNIN architecture.

[1]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[2]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Yong-Sheng Chen,et al.  Batch-normalized Maxout Network in Network , 2015, ArXiv.

[4]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[6]  Chi-Man Vong,et al.  Local Receptive Fields Based Extreme Learning Machine , 2015, IEEE Computational Intelligence Magazine.

[7]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  M. Carandini,et al.  Normalization as a canonical neural computation , 2013, Nature Reviews Neuroscience.

[9]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[10]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[11]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[14]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[15]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[18]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[19]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[20]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .

[21]  Quoc V. Le,et al.  ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[22]  Yuxiao Hu,et al.  Learning a Spatially Smooth Subspace for Face Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Fuzhen Zhuang,et al.  Learning deep representations via extreme learning machines , 2015, Neurocomputing.

[24]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[25]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[26]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[27]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[28]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[29]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[30]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[31]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[32]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[33]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[34]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[35]  Ian J. Goodfellow Piecewise Linear Multilayer Perceptrons and Dropout , 2013, ArXiv.

[36]  Guang-Bin Huang,et al.  Extreme Learning Machine for Multilayer Perceptron , 2016, IEEE Transactions on Neural Networks and Learning Systems.