论文信息 - Image2Emoji: Zero-shot Emoji Prediction for Visual Media

Image2Emoji: Zero-shot Emoji Prediction for Visual Media

We present Image2Emoji, a multi-modal approach for generating emoji labels for an image in a zero-shot manner. Different from existing zero-shot image-to-text approaches, we exploit both image and textual media to learn a semantic embedding for the new task of emoji prediction. We propose that the widespread adoption of emoji suggests a semantic universality which is well-suited for interaction with visual media. We quantify the efficacy of our proposed model on the MSCOCO dataset, and demonstrate the value of visual, textual and multi-modal prediction of emoji. We conclude the paper with three examples of the application potential of emoji in the context of multimedia retrieval.

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] Samy Bengio,et al. Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[3] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4] David A. Shamma,et al. The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[5] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7] Rongrong Ji,et al. Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.