论文信息 - Senti-Attend: Image Captioning using Sentiment and Attention

Senti-Attend: Image Captioning using Sentiment and Attention

There has been much recent work on image captioning models that describe the factual aspects of an image. Recently, some models have incorporated non-factual aspects into the captions, such as sentiment or style. However, such models typically have difficulty in balancing the semantic aspects of the image and the non-factual dimensions of the caption; in addition, it can be observed that humans may focus on different aspects of an image depending on the chosen sentiment or style of the caption. To address this, we design an attention-based model to better add sentiment to image captions. The model embeds and learns sentiment with respect to image-caption data, and uses both high-level and word-level sentiment information during the learning process. The model outperforms the state-of-the-art work in image captioning with sentiment using standard evaluation metrics. An analysis of generated captions also shows that our model does this by a better selection of the sentiment-bearing adjectives and adjective-noun pairs.

[1] Zhe Gan,et al. StyleNet: Generating Attractive Visual Captions with Styles , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Xiaoyan Zhu,et al. Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory , 2017, AAAI.

[4] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[7] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[10] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[11] Len Hamey,et al. Face-Cap: Image Captioning using Facial Expression Analysis , 2018, ECML/PKDD.

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[14] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.

[16] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17] Daniel McDuff,et al. Emotional Dialogue Generation using Image-Grounded Language Models , 2018, CHI.

[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Pratik Rane,et al. Self-Critical Sequence Training for Image Captioning , 2018 .

[20] Louis-Philippe Morency,et al. Affect-LM: A Neural Language Model for Customizable Affective Text Generation , 2017, ACL.

[21] Jongwook Choi,et al. End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[23] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[24] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[25] Jiebo Luo,et al. Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions , 2018, ArXiv.

[26] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[28] Eric P. Xing,et al. Toward Controlled Generation of Text , 2017, ICML.

[29] Ilya Sutskever,et al. Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[30] Lexing Xie,et al. SentiCap: Generating Image Descriptions with Sentiments , 2015, AAAI.

[31] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[32] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Lillian Lee,et al. Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..