Joint Image and Text Representation for Aesthetics Analysis

Image aesthetics assessment is essential to multimedia applications such as image retrieval, and personalized image search and recommendation. Primarily relying on visual information and manually-supplied ratings, previous studies in this area have not adequately utilized higher-level semantic information. We incorporate additional textual phrases from user comments to jointly represent image aesthetics utilizing multimodal Deep Boltzmann Machine. Given an image, without requiring any associated user comments, the proposed algorithm automatically infers the joint representation and predicts the aesthetics category of the image. We construct the AVA-Comments dataset to systematically evaluate the performance of the proposed algorithm. Experimental results indicate that the proposed joint representation improves the performance of aesthetics assessment on the benchmarking AVA dataset, comparing with only visual features.

[1]  Marc'Aurelio Ranzato,et al.  Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews , 2014, ICLR.

[2]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[3]  Lukás Burget,et al.  Recurrent Neural Network Based Language Modeling in Meeting Recognition , 2011, INTERSPEECH.

[4]  Xiaoou Tang,et al.  Photo and Video Quality Evaluation: Focusing on the Subject , 2008, ECCV.

[5]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[6]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[9]  Mubarak Shah,et al.  A framework for photo-quality assessment and enhancement based on visual aesthetics , 2010, ACM Multimedia.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[12]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Prasanth Veerina Learning Good Taste : Classifying Aesthetic Images , 2015 .

[14]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  James Zijun Wang,et al.  RAPID: Rating Pictorial Aesthetics using Deep Learning , 2014, ACM Multimedia.

[16]  Tao Mei,et al.  Query-Dependent Aesthetic Model With Deep Learning for Photo Quality Assessment , 2015, IEEE Transactions on Multimedia.

[17]  Gabriela Csurka,et al.  Assessing the aesthetic quality of photographs using generic image descriptors , 2011, 2011 International Conference on Computer Vision.

[18]  Masashi Nishiyama,et al.  Aesthetic quality classification of photographs based on color harmony , 2011, CVPR 2011.