The Focus-Aspect-Value Model for Explainable Prediction of Subjective Visual Interpretation

Subjective visual interpretation is a challenging yet important topic in computer vision. Many approaches reduce this problem to the prediction of adjective- or attribute-labels from images. However,most of these do not take attribute semantics into account, or only process the image in a holistic manner. Furthermore, there is alack of relevant datasets with fine-grained subjective labels. In this paper, we propose the Focus-Aspect-Value (FAV) model to structure the process of capturing subjectivity in image processing,and introduce a novel dataset following this way of modeling. We run experiments on this dataset to compare several deep learning methods and find that incorporating context information based on tensor multiplication outperforms the default way of information fusion (concatenation).

[1]  Radu Soricut,et al.  Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.

[2]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[3]  Rongrong Ji,et al.  SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content , 2013, ACM Multimedia.

[4]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[5]  Helmut Feldweg,et al.  GermaNet - a Lexical-Semantic Net for German , 1997 .

[6]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[7]  Nuria Oliver,et al.  Towards Computational Models of the Visual Aesthetic Appeal of Consumer Videos , 2010, ECCV.

[8]  Shih-Fu Chang,et al.  Deep Cross Residual Learning for Multitask Visual Recognition , 2016, ACM Multimedia.

[9]  Tao Chen,et al.  Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology , 2015, ACM Multimedia.

[10]  Matthieu Cord,et al.  MUTAN: Multimodal Tucker Fusion for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Andreas Kleinschmidt,et al.  Ongoing Activity Fluctuations in hMT+ Bias the Perception of Coherent Visual Motion , 2008, The Journal of Neuroscience.

[12]  C. Carbon Cognitive mechanisms for explaining dynamics of aesthetic appreciation , 2011, i-Perception.

[13]  A. Azzouz 2011 , 2020, City.

[14]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[16]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Andreas Dengel,et al.  Real-time Analysis and Visualization of the YFCC100m Dataset , 2015, MMCommons '15.

[19]  David Bamman,et al.  Distributed Representations of Geographically Situated Language , 2014, ACL.

[20]  Georgiana Dinu,et al.  From Visual Attributes to Adjectives through Decompositional Distributional Semantics , 2015, Transactions of the Association for Computational Linguistics.

[21]  A. Yuille,et al.  Bayesian decision theory and psychophysics , 1996 .

[22]  E. Guevara A Regression Model of Adjective-Noun Compositionality in Distributional Semantics , 2010 .

[23]  Chen Xu,et al.  The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Philipp Cimiano,et al.  Learning Compositionality Functions on Word Embeddings for Modelling Attribute Meaning in Adjective-Noun Phrases , 2017, EACL.

[27]  P. Schyns,et al.  Measuring Internal Representations from Behavioral and Brain Data , 2012, Current Biology.