论文信息 - Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks

Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks

In this paper we propose and investigate a novel nonlinear unit, called L p unit, for deep neural networks. The proposed L p unit receives signals from several projections of a subset of units in the layer below and computes a normalized L p norm. We notice two interesting interpretations of the L p unit. First, the proposed unit can be understood as a generalization of a number of conventional pooling operators such as average, root-mean-square and max pooling widely used in, for instance, convolutional neural networks (CNN), HMAX models and neocognitrons. Furthermore, the L p unit is, to a certain degree, similar to the recently proposed maxout unit [13] which achieved the state-of-the-art object recognition results on a number of benchmark datasets. Secondly, we provide a geometrical interpretation of the activation function based on which we argue that the L p unit is more efficient at representing complex, nonlinear separating boundaries. Each L p unit defines a superelliptic boundary, with its exact shape defined by the order p. We claim that this makes it possible to model arbitrarily shaped, curved boundaries more efficiently by combining a few L p units of different orders. This insight justifies the need for learning different orders for each unit in the model. We empirically evaluate the proposed L p units on a number of datasets and show that multilayer perceptrons (MLP) consisting of the L p units achieve the state-of-the-art results on a number of benchmark datasets. Furthermore, we evaluate the proposed L p unit on the recently proposed deep recurrent neural networks (RNN).

[1] J. Orbach. Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[2] D. Hubel,et al. Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[3] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[4] Geoffrey E. Hinton,et al. Learning representations by back-propagation errors, nature , 1986 .

[5] Geoffrey E. Hinton,et al. Learning representations of back-propagation errors , 1986 .

[6] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[7] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[9] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[10] Ronald,et al. Learning representations by backpropagating errors , 2004 .

[11] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12] A. Hyvärinen,et al. Complex cell pooling and the statistics of natural images , 2007, Network.

[13] M. Trebar,et al. Application of distributed SVM architectures in classifying forest data cover types , 2008 .

[14] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16] Jean Ponce,et al. A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[17] Razvan Pascanu,et al. Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[18] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[19] Simon Haykin,et al. Neural Networks and Learning Machines , 2010 .

[20] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[21] Pascal Vincent,et al. The Manifold Tangent Classifier , 2011, NIPS.

[22] Yoshua Bengio,et al. Suitability of V1 Energy Models for Object Classification , 2011, Neural Computation.

[23] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[24] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[25] Yoshua Bengio,et al. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[26] Jürgen Schmidhuber,et al. Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[27] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[28] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[29] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[31] Pascal Vincent,et al. Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[32] Yoshua Bengio,et al. Deep Learning of Representations , 2013, Handbook on Neural Information Processing.

[33] Geoffrey E. Hinton,et al. Modeling Natural Images Using Gated MRFs , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Razvan Pascanu,et al. Learned-norm pooling for deep neural networks , 2013, ArXiv.

[35] Ian J. Goodfellow,et al. Pylearn2: a machine learning research library , 2013, ArXiv.

[36] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[37] Yoshua Bengio,et al. High-dimensional sequence transduction , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[39] Razvan Pascanu,et al. How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[40] Christian Osendorfer,et al. On Fast Dropout and its Applicability to Recurrent Networks , 2013, ICLR.

[41] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[42] Yoshua Bengio,et al. Knowledge Matters: Importance of Prior Information for Optimization , 2013, J. Mach. Learn. Res..