SwiDeN: Convolutional Neural Networks For Depiction Invariant Object Recognition

Current state of the art object recognition architectures achieve impressive performance but are typically specialized for a single depictive style (e.g. photos only, sketches only). In this paper, we present SwiDeN: our Convolutional Neural Network (CNN) architecture which recognizes objects regardless of how they are visually depicted (line drawing, realistic shaded drawing, photograph etc.). In SwiDeN, we utilize a novel `deep' depictive style-based switching mechanism which appropriately addresses the depiction-specific and depiction-invariant aspects of the problem. We compare SwiDeN with alternative architectures and prior work on a 50-category Photo-Art dataset containing objects depicted in multiple styles. Experimental results show that SwiDeN outperforms other approaches for the depiction-invariant object recognition problem.

[1]  Tao Xiang,et al.  Sketch-a-Net that Beats Humans , 2015, BMVC.

[2]  CipollaRoberto,et al.  Multiscale Categorical Object Recognition Using Contour Fragments , 2008 .

[3]  Hongping Cai,et al.  Learning Graphs to Model Visual Objects across Different Depictive Styles , 2014, ECCV.

[4]  Andrew Blake,et al.  Multiscale Categorical Object Recognition Using Contour Fragments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Shimon Ullman,et al.  Atoms of recognition in human and computer vision , 2016, Proceedings of the National Academy of Sciences.

[8]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[9]  Qi Wu,et al.  Beyond Photo-Domain Object Recognition: Benchmarks for the Cross-Depiction Problem , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[10]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[11]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[12]  Xiao Bai,et al.  Learning invariant structure for object identification by using graph methods , 2011, Comput. Vis. Image Underst..

[13]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[14]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[15]  Hongping Cai,et al.  The Cross-Depiction Problem: Computer Vision Algorithms for Recognising Objects in Artwork and in Photographs , 2015, ArXiv.

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Sanja Fidler,et al.  Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.