About pyramid structure in convolutional neural networks

Deep convolutional neural networks (CNN) brought revolution without any doubt to various challenging tasks, mainly in computer vision. However, their model designing still requires attention to reduce number of learnable parameters, with no meaningful reduction in performance. In this paper we investigate to what extend CNN may take advantage of pyramid structure typical of biological neurons. A generalized statement over convolutional layers from input till fully connected layer is introduced that helps further in understanding and designing a successful deep network. It reduces ambiguity, number of parameters, and their size on disk without degrading overall accuracy. Performance are shown on state-of-the-art models for MNIST, Cifar-10, Cifar-100, and ImageNet-12 datasets. Despite more than 80% reduction in parameters for Caffe LENET, challenging results are obtained. Further, despite 10 - 20% reduction in training data along with 10 - 40% reduction in parameters for Alexnet model and its variations, competitive results are achieved when compared to similar well-engineered deeper architectures.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[4]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[8]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[9]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[10]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[11]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[12]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[13]  Abdesselam Bouzerdoum,et al.  A Pyramidal Neural Network For Visual Pattern Recognition , 2007, IEEE Transactions on Neural Networks.

[14]  Walter G. Kropatsch,et al.  Neural Networks versus Image Pyramids , 1993 .

[15]  George D. C. Cavalcanti,et al.  Classification and Segmentation of Visual Patterns Based on Receptive and Inhibitory Fields , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[16]  D H HUBEL,et al.  THE VISUAL CORTEX OF THE BRAIN. , 1963, Scientific American.

[17]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Yuning Jiang,et al.  Learning Deep Face Representation , 2014, ArXiv.

[21]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Pushmeet Kohli,et al.  Memory Bounded Deep Convolutional Networks , 2014, ArXiv.

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Peng Wang,et al.  Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Jürgen Schmidhuber,et al.  Object Recognition with Multi-Scale Pyramidal Pooling Networks , 2012, ArXiv.

[29]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[30]  George D. C. Cavalcanti,et al.  Lateral Inhibition Pyramidal Neural Network for Image Classification , 2013, IEEE Transactions on Cybernetics.