The Cross-Depiction Problem: Computer Vision Algorithms for Recognising Objects in Artwork and in Photographs

The cross-depiction problem is that of recognising visual objects regardless of whether they are photographed, painted, drawn, etc. It is a potentially significant yet under-researched problem. Emulating the remarkable human ability to recognise objects in an astonishingly wide variety of depictive forms is likely to advance both the foundations and the applications of Computer Vision. In this paper we benchmark classification, domain adaptation, and deep learning methods; demonstrating that none perform consistently well in the cross-depiction problem. Given the current interest in deep learning, the fact such methods exhibit the same behaviour as all but one other method: they show a significant fall in performance over inhomogeneous databases compared to their peak performance, which is always over data comprising photographs only. Rather, we find the methods that have strong models of spatial relations between parts tend to be more robust and therefore conclude that such information is important in modelling object classes regardless of appearance details.

[1]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[2]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Gérard G. Medioni,et al.  Hierarchical Decomposition and Axial Shape Description , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[5]  Daniel Snow,et al.  Efficient Deformable Template Detection and Localization without User Initialization , 2000, Comput. Vis. Image Underst..

[6]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Sven J. Dickinson,et al.  Skeleton based shape matching and retrieval , 2003, 2003 Shape Modeling International..

[9]  Ali Shokoufandeh,et al.  Shock Graphs and Shape Matching , 1998, International Journal of Computer Vision.

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[12]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[13]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Neil Genzlinger A. and Q , 2006 .

[16]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Daphne Koller,et al.  Learning Object Shape: From Drawings to Images , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Yali Amit,et al.  POP: Patchwork of Parts Models for Object Recognition , 2007, International Journal of Computer Vision.

[19]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[20]  Martial Hebert,et al.  Beyond Local Appearance: Category Recognition from Pairwise Interactions of Simple Features , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Frédéric Jurie,et al.  Groups of Adjacent Contour Segments for Object Detection , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Joseph J. Lim,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[24]  Cordelia Schmid,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[25]  Andrew Zisserman,et al.  Efficient retrieval of deformable shape classes using local self-similarities , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[26]  Yu Qian,et al.  Storyboard sketches for Content Based Video Retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[29]  Stephen J. McKenna,et al.  Classifying Textile Designs Using Bags of Shapes , 2010, 2010 20th International Conference on Pattern Recognition.

[30]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[31]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[33]  Rui Hu,et al.  Gradient field descriptor for sketch based retrieval and localization , 2010, 2010 IEEE International Conference on Image Processing.

[34]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[35]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[36]  Marc Alexa,et al.  Sketch-Based Image Retrieval: Benchmark and Bag-of-Features Descriptors , 2011, IEEE Transactions on Visualization and Computer Graphics.

[37]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[38]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[39]  Xiao Bai,et al.  Learning invariant structure for object identification by using graph methods , 2011, Comput. Vis. Image Underst..

[40]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[41]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Qi Wu,et al.  Prime Shapes in Natural Images , 2012, BMVC.

[43]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[44]  Marc Alexa,et al.  Sketch-based shape retrieval , 2012, ACM Trans. Graph..

[45]  Jean Ponce,et al.  Learning Graphs to Match , 2013, 2013 IEEE International Conference on Computer Vision.

[46]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[47]  Rui Hu,et al.  Markov random fields for sketch based video retrieval , 2013, ICMR '13.

[48]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[49]  Andrew Zisserman,et al.  Of Gods and Goats: Weakly Supervised Learning of Figurative Art , 2013, BMVC.

[50]  Paul L. Rosin,et al.  Abstract Art by Shape Classification , 2013, IEEE Transactions on Visualization and Computer Graphics.

[51]  Shaogang Gong,et al.  Sketch Recognition by Ensemble Matching of Structured Features , 2013, BMVC.

[52]  Qi Wu,et al.  Modelling Visual Objects Invariant to Depictive Style , 2013, BMVC.

[53]  Yi-Zhe Song,et al.  Simple art as abstractions of photographs , 2013, CAE '13.

[54]  Kristen Grauman,et al.  Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation , 2013, ICML.

[55]  Rui Hu,et al.  A performance evaluation of gradient field HOG descriptor for sketch based image retrieval , 2013, Comput. Vis. Image Underst..

[56]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Andrew Zisserman,et al.  In Search of Art , 2014, ECCV Workshops.

[58]  Andrew Zisserman,et al.  The State of the Art: Object Retrieval in Paintings using Discriminative Regions , 2014, BMVC.

[59]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[60]  Hongping Cai,et al.  Learning Graphs to Model Visual Objects across Different Depictive Styles , 2014, ECCV.

[61]  Jitendra Malik,et al.  Detecting people in Cubist art , 2014, SIGAI.