Understanding Compositional Structures in Art Historical Images using Pose and Gaze Priors

Image compositions as a tool for analysis of artworks is of extreme significance for art historians. These compositions are useful in analyzing the interactions in an image to study artists and their artworks. Max Imdahl in his work called Ikonik, along with other prominent art historians of the 20th century, underlined the aesthetic and semantic importance of the structural composition of an image. Understanding underlying compositional structures within images is challenging and a time consuming task. Generating these structures automatically using computer vision techniques (1) can help art historians towards their sophisticated analysis by saving lot of time; providing an overview and access to huge image repositories and (2) also provide an important step towards an understanding of man made imagery by machines. In this work, we attempt to automate this process using the existing state of the art machine learning techniques, without involving any form of training. Our approach, inspired by Max Imdahl's pioneering work, focuses on two central themes of image composition: (a) detection of action regions and action lines of the artwork; and (b) pose-based segmentation of foreground and background. Currently, our approach works for artworks comprising of protagonists (persons) in an image. In order to validate our approach qualitatively and quantitatively, we conduct a user study involving experts and non-experts. The outcome of the study highly correlates with our approach and also demonstrates its domain-agnostic capability. We have open-sourced the code at this https URL.

[1]  Antonio Torralba,et al.  Where are they looking? , 2015, NIPS.

[2]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[4]  Ondřej Chum,et al.  Linking Art through Human Poses , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[5]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[6]  Saïd Ladjal,et al.  Weakly Supervised Object Detection in Artworks , 2018, ECCV Workshops.

[7]  Vlad Ionescu,et al.  The phenomenological model of image analysis: Fiedler, Husserl, Imdahl , 2014 .

[8]  Anil K. Jain,et al.  A modified Hausdorff distance for object matching , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[9]  George Vogiatzis,et al.  How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval , 2018, ECCV Workshops.

[10]  Claus Volkenandt Bildfeld und Feldlinien. Formen des vergleichenden Sehens bei Max Imdahl, Theodor Hetzer und Dagobert Frey , 2010 .

[11]  Alexandru Telea,et al.  An Image Inpainting Technique Based on the Fast Marching Method , 2004, J. Graphics, GPU, & Game Tools.

[12]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[13]  Ankush Gupta,et al.  Unsupervised Learning of Object Landmarks through Conditional Image Generation , 2018, NeurIPS.

[14]  Peter Bell,et al.  Ikonographie und Interaktion. Computergestützte Analyse von Posen in Bildern der Heilsgeschichte , 2019, Das Mittelalter.

[15]  Max Imdahl,et al.  Giotto Arenafresken : Ikonographie, Ikonologie, Ikonik , 1975 .

[16]  Elie Bienenstock,et al.  Compositionality, MDL Priors, and Object Recognition , 1996, NIPS.

[17]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[20]  Wojciech Matusik,et al.  Gaze360: Physically Unconstrained Gaze Estimation in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Bernt Schiele,et al.  Learning to Refine Human Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Aysegul Dundar,et al.  Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Chenxi Liu,et al.  Deep Nets: What have They Ever Done for Vision? , 2018, International Journal of Computer Vision.

[24]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Björn Ommer,et al.  Unsupervised Part-Based Disentangling of Object Shape and Appearance , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[27]  Vincent Christlein,et al.  Recognizing Characters in Art History Using Deep Learning , 2019, SUMAC @ ACM Multimedia.

[28]  Rita Cucchiara,et al.  Artpedia: A New Visual-Semantic Dataset with Visual and Contextual Sentences in the Artistic Domain , 2019, ICIAP.

[29]  Yuta Nakashima,et al.  Context-Aware Embeddings for Automatic Art Analysis , 2019, ICMR.