What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues

Grounding a pronoun to a visual object it refers to requires complex reasoning from various information sources, especially in conversational scenarios. For example, when people in a conversation talk about something all speakers can see, they often directly use pronouns (e.g., it) to refer to it without previous introduction. This fact brings a huge challenge for modern natural language understanding systems, particularly conventional context-based pronoun coreference models. To tackle this challenge, in this paper, we formally define the task of visual-aware pronoun coreference resolution (PCR) and introduce VisPro, a large-scale dialogue PCR dataset, to investigate whether and how the visual information can help resolve pronouns in dialogues. We then propose a novel visual-aware PCR model, VisCoref, for this task and conduct comprehensive experiments and case studies on our dataset. Results demonstrate the importance of the visual information in this PCR case and show the effectiveness of the proposed model.

[1]  José M. F. Moura,et al.  Visual Coreference Resolution in Visual Dialog using Neural Module Networks , 2018, ECCV.

[2]  Kate Ehrlich,et al.  Search and Inference Strategies in Pronoun Resolution: an Experimental Study , 1981, ACL.

[3]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[6]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[7]  Yan Song,et al.  Incorporating Context and External Knowledge for Pronoun Coreference Resolution , 2019, NAACL.

[8]  Christopher D. Manning,et al.  Deep Reinforcement Learning for Mention-Ranking Coreference Models , 2016, EMNLP.

[9]  Heeyoung Lee,et al.  A Multi-Pass Sieve for Coreference Resolution , 2010, EMNLP.

[10]  Claire Cardie,et al.  Conundrums in Noun Phrase Coreference Resolution: Making Sense of the State-of-the-Art , 2009, ACL.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Liane Guillou,et al.  Improving Pronoun Translation for Statistical Machine Translation , 2012, EACL.

[14]  Alan L. Yuille,et al.  Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yan Song,et al.  Knowledge-aware Pronoun Coreference Resolution , 2019, ACL.

[16]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[17]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[18]  José M. F. Moura,et al.  Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Karel Jezek,et al.  Two uses of anaphora resolution in summarization , 2007, Inf. Process. Manag..

[20]  Michael Strube,et al.  A Machine Learning Approach to Pronoun Resolution in Spoken Dialogue , 2003, ACL.

[21]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[22]  Christopher D. Manning,et al.  Entity-Centric Coreference Resolution with Model Stacking , 2015, ACL.

[23]  Luke S. Zettlemoyer,et al.  Higher-Order Coreference Resolution with Coarse-to-Fine Inference , 2018, NAACL.

[24]  Hong Chen,et al.  PreCo: A Large-scale Dataset in Preschool Vocabulary for Coreference Resolution , 2018, EMNLP.

[25]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[26]  Vincent Ng Supervised Ranking for Pronoun Resolution: Some Recent Improvements , 2005, AAAI.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Gareth J. F. Jones,et al.  An investigation of broad coverage automatic pronoun resolution for information retrieval , 2003, SIGIR '03.