Cross-lingual Visual Verb Sense Disambiguation

Recent work has shown that visual context improves cross-lingual sense disambiguation for nouns. We extend this line of work to the more challenging task of cross-lingual verb sense disambiguation, introducing the MultiSense dataset of 9,504 images annotated with English, German, and Spanish verbs. Each image in MultiSense is annotated with an English verb and its translation in German or Spanish. We show that cross-lingual verb sense disambiguation models benefit from visual context, compared to unimodal baselines. We also show that the verb sense predicted by our best disambiguation model can improve the results of a text-only machine translation system when used for a multimodal translation task.

[1]  Rico Sennrich,et al.  The University of Edinburgh’s Neural MT Systems for WMT17 , 2017, WMT.

[2]  Stephen Clark,et al.  Visual Bilingual Lexicon Induction with Transferred ConvNet Features , 2015, EMNLP.

[3]  Joost van de Weijer,et al.  LIUM-CVC Submissions for WMT18 Multimodal Translation Task , 2018, WMT.

[4]  Geoffrey Zweig,et al.  From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  David A. Forsyth,et al.  Discriminating Image Senses by Clustering with Multimodal Features , 2006, ACL.

[6]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[7]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[10]  Frank Keller,et al.  Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings , 2016, NAACL.

[11]  Jindřich Helcl,et al.  CUNI System for the WMT18 Multimodal Translation Task , 2018, WMT.

[12]  Lucia Specia,et al.  SHEF-Multimodal: Grounding Machine Translation on Images , 2016, WMT.

[13]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.

[14]  Frank Keller,et al.  Image Pivoting for Learning Multilingual Multimodal Representations , 2017, EMNLP.

[15]  Marie-Francine Moens,et al.  Multi-Modal Representations for Improved Bilingual Lexicon Learning , 2016, ACL.

[16]  Khalil Sima'an,et al.  A Shared Task on Multimodal Machine Translation and Crosslingual Image Description , 2016, WMT.

[17]  Benjamin Van Durme,et al.  Learning Bilingual Lexicons Using the Visual Similarity of Labeled Web Images , 2011, IJCAI.

[18]  Desmond Elliott,et al.  Lessons Learned in Multilingual Grounded Language Learning , 2018, CoNLL.

[19]  Frank Keller,et al.  Disambiguating Visual Verbs , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Slav Petrov,et al.  Syntactic Annotations for the Google Books NGram Corpus , 2012, ACL.

[21]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[22]  Kobus Barnard,et al.  Word Sense Disambiguation with Pictures , 2003, Artif. Intell..

[23]  Michael J. Denkowski,et al.  Sockeye: A Toolkit for Neural Machine Translation , 2017, ArXiv.

[24]  Lucia Specia,et al.  Multimodal Lexical Translation , 2018, LREC.

[25]  Desmond Elliott,et al.  Assessing multilingual multimodal image description: Studies of native speaker preferences and translator choices , 2018, Natural Language Engineering.

[26]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[27]  Véronique Hoste,et al.  SemEval-2013 Task 10: Cross-lingual Word Sense Disambiguation , 2013, *SEMEVAL.

[28]  Stefan Riezler,et al.  Multimodal Pivots for Image Caption Translation , 2016, ACL.

[29]  Angeliki Lazaridou,et al.  Combining Language and Vision with a Multimodal Skip-gram Model , 2015, NAACL.

[30]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[31]  Adam Kilgarriff,et al.  SENSEVAL: an exercise in evaluating world sense disambiguation programs , 1998, LREC.

[32]  Desmond Elliott,et al.  Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description , 2017, WMT.

[33]  Trevor Darrell,et al.  Unsupervised Learning of Visual Sense Models for Polysemous Words , 2008, NIPS.