Convolution Neural Networks With Two Pathways for Image Style Recognition

Automatic recognition of an image’s style is important for many applications, including artwork analysis, photo organization, and image retrieval. Traditional convolution neural network (CNN) approach uses only object features for image style recognition. This approach may not be optimal, because the same object in two images may have different styles. We propose a CNN architecture with two pathways extracting object features and texture features, respectively. The object pathway represents the standard CNN architecture and the texture pathway intermixes the object pathway by outputting the gram matrices of intermediate features in the object pathway. The two pathways are jointly trained. In experiments, two deep CNNs, AlexNet and VGG-19, pretrained on the ImageNet classification data set are fine-tuned for this task. For any model, the two-pathway architecture performs much better than individual pathways, which indicates that the two pathways contain complementary information of an image’s style. In particular, the model based on VGG-19 achieves the state-of-the-art results on three benchmark data sets, WikiPaintings, Flickr Style, and AVA Style.

[1]  Andrea Vedaldi,et al.  Texture Networks: Feed-forward Synthesis of Textures and Stylized Images , 2016, ICML.

[2]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Irfan A. Essa,et al.  Graphcut textures: image and video synthesis using graph cuts , 2003, ACM Trans. Graph..

[5]  Stefan Winkler,et al.  Inferring Painting Style with Multi-Task Dictionary Learning , 2015, IJCAI.

[6]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[7]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[8]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[9]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[10]  Karolin Baecker,et al.  Two Dimensional Signal And Image Processing , 2016 .

[11]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[12]  Lior Shamir,et al.  Impressionism, expressionism, surrealism: Automated recognition of painters and schools of art , 2010, TAP.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Daniel Keren,et al.  Painter identification using local features and naive Bayes , 2002, Object recognition supported by user interaction for service robots.

[16]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[17]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[18]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Eero P. Simoncelli,et al.  A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[20]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[21]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[24]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[25]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[26]  Muhammad Sarfraz Computer Vision and Image Processing in Intelligent Systems and Multimedia Technologies , 2014 .

[27]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[29]  Béla Julesz,et al.  Visual Pattern Discrimination , 1962, IRE Trans. Inf. Theory.

[30]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Alexei A. Efros,et al.  Image quilting for texture synthesis and transfer , 2001, SIGGRAPH.

[33]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[34]  Lior Wolf,et al.  Classification of Artistic Styles Using Binarized Features Derived from a Deep Neural Network , 2014, ECCV Workshops.

[35]  Radomír Mech,et al.  Deep Multi-patch Aggregation Network for Image Style, Aesthetics, and Quality Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..