Fine-tuning deep convolutional neural networks for distinguishing illustrations from photographs

Automatically detecting illustrations is needed for the target system.Deep Convolutional Neural Networks have been successful in computer vision tasks.DCNN with fine-tuning outperformed the other models including handcrafted features. Systems for aggregating illustrations require a function for automatically distinguishing illustrations from photographs as they crawl the network to collect images. A previous attempt to implement this functionality by designing basic features that were deemed useful for classification achieved an accuracy of only about 58%. On the other hand, deep neural networks had been successful in computer vision tasks, and convolutional neural networks (CNNs) had performed good at extracting such useful image features automatically. We evaluated alternative methods to implement this classification functionality with focus on deep neural networks. As the result of experiments, the method that fine-tuned deep convolutional neural network (DCNN) acquired 96.8% accuracy, outperforming the other models including the custom CNN models that were trained from scratch. We conclude that DCNN with fine-tuning is the best method for implementing a function for automatically distinguishing illustrations from photographs.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[5]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[6]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[7]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[8]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[10]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[11]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[12]  Riad I. Hammoud,et al.  Estimating the photorealism of images: distinguishing paintings from photographs , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[14]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[15]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[18]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[19]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[20]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[21]  C. Frankel,et al.  Distinguishing photographs and graphics on the World Wide Web , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[22]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[23]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[24]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Christine Connolly,et al.  A study of efficiency and accuracy in the transformation from RGB to CIELAB color space , 1997, IEEE Trans. Image Process..