Assistive Image Comment Robot—A Novel Mid-Level Concept-Based Representation

We present a general framework and working system for predicting likely affective responses of the viewers in the social media environment after an image is posted online. Our approach emphasizes a mid-level concept representation, in which intended affects of the image publisher is characterized by a large pool of visual concepts (termed PACs) detected from image content directly instead of textual metadata, evoked viewer affects are represented by concepts (termed VACs) mined from online comments, and statistical methods are used to model the correlations among these two types of concepts. We demonstrate the utilities of such approaches by developing an end-to-end Assistive Comment Robot application, which further includes components for multi-sentence comment generation, interactive interfaces, and relevance feedback functions. Through user studies, we showed machine suggested comments were accepted by users for online posting in 90 percent of completed user sessions, while very favorable results were also observed in various dimensions (plausibility, preference, and realism) when assessing the quality of the generated image comments.

[1]  Mohammad Soleymani,et al.  Crowdsourcing for Affective Annotation of Video: Development of a Viewer-reported Boredom Corpus , 2010 .

[2]  P. Lang International affective picture system (IAPS) : affective ratings of pictures and instruction manual , 2005 .

[3]  Elad Yom-Tov,et al.  Updating Users about Time Critical Events , 2013, ECIR.

[4]  Alfred Kobsa,et al.  The Adaptive Web, Methods and Strategies of Web Personalization , 2007, The Adaptive Web.

[5]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[6]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[7]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[8]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[9]  Nicu Sebe,et al.  Emotion Recognition Based on Joint Visual and Audio Cues , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[10]  Xiaoyan Zhu,et al.  Movie review mining and summarization , 2006, CIKM '06.

[11]  R. Simons,et al.  Roll ‘em!: The effects of picture motion on emotional responses , 1998 .

[12]  R. Simons,et al.  Affective response to color-slide stimuli in subjects with physical anhedonia: a three-systems analysis. , 2007, Psychophysiology.

[13]  Yi-Hsuan Yang,et al.  A Regression Approach to Music Emotion Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[15]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[16]  Yejin Choi,et al.  Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.

[17]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[18]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[19]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[20]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[21]  Kiyoharu Aizawa,et al.  Affective Audio-Visual Words and Latent Topic Driving Model for Realizing Movie Affective Scene Classification , 2010, IEEE Transactions on Multimedia.

[22]  Jianxiong Xiao,et al.  What makes an image memorable? , 2011, CVPR 2011.

[23]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[24]  Yi-Hsuan Yang,et al.  Quantitative Study of Music Listening Behavior in a Social and Affective Context , 2013, IEEE Transactions on Multimedia.

[25]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[26]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[27]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[28]  Wolfgang Nejdl,et al.  How useful are your comments?: analyzing and predicting youtube comments and comment ratings , 2010, WWW '10.

[29]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[30]  Jiebo Luo,et al.  Sentribute: image sentiment analysis from a mid-level perspective , 2013, WISDOM '13.

[31]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[32]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[33]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[34]  Tamara L. Berg,et al.  Baby Talk : Understanding and Generating Image Descriptions , 2011 .

[35]  Tao Chen,et al.  Predicting Viewer Affective Comments Based on Image Content in Social Media , 2014, ICMR.

[36]  Janyce Wiebe,et al.  Recognizing subjectivity: a case study in manual tagging , 1999, Natural Language Engineering.

[37]  Thierry Pun,et al.  Multimodal Emotion Recognition in Response to Videos , 2012, IEEE Transactions on Affective Computing.

[38]  Marco Guerini,et al.  Exploring Image Virality in Google Plus , 2013, 2013 International Conference on Social Computing.

[39]  Qianhua He,et al.  A survey on emotional semantic image retrieval , 2008, 2008 15th IEEE International Conference on Image Processing.

[40]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[41]  Mohammad Soleymani,et al.  Single Trial Classification of EEG and Peripheral Physiological Signals for Recognition of Emotions Induced by Music Videos , 2010, Brain Informatics.

[42]  R. Plutchik Emotion, a psychoevolutionary synthesis , 1980 .

[43]  Vicente Ordonez,et al.  Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.