Factorized Bilinear Models for Image Recognition

Although Deep Convolutional Neural Networks (CNNs) have liberated their power in various computer vision tasks, the most important components of CNN, convolutional layers and fully connected layers, are still limited to linear transformations. In this paper, we propose a novel Factorized Bilinear (FB) layer to model the pairwise feature interactions by considering the quadratic terms in the transformations. Compared with existing methods that tried to incorporate complex non-linearity structures into CNNs, the factorized parameterization makes our FB layer only require a linear increase of parameters and affordable computational cost. To further reduce the risk of overfitting of the FB layer, a specific remedy called DropFactor is devised during the training process. We also analyze the connection between FB layer and some existing models, and show FB layer is a generalization to them. Finally, we validate the effectiveness of FB layer on several widely adopted datasets including CIFAR-10, CIFAR-100 and ImageNet, and demonstrate superior results compared with various state-of-the-art deep models.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[8]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[9]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[10]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Subhransu Maji,et al.  One-to-many face recognition with bilinear CNNs , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Rasmus Pagh,et al.  Fast and scalable polynomial kernels via explicit feature maps , 2013, KDD.

[15]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[16]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[17]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[18]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[19]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  Nanning Zheng,et al.  Similarity learning on an explicit polynomial kernel feature map for person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Harish Karnick,et al.  Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[24]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[25]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[26]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[28]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[29]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[30]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[32]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[34]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[35]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Li Xu,et al.  On Vectorization of Deep Convolutional Neural Networks for Vision Tasks , 2015, AAAI.

[37]  Yangqing Jia,et al.  Learning Semantic Image Representations at a Large Scale , 2014 .

[38]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[39]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[40]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[41]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[42]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.