Reference and coreference in situated dialogue

In recent years several corpora have been developed for vision and language tasks. We argue that there is still significant room for corpora that increase the complexity of both visual and linguistic domains and which capture different varieties of perceptual and conversational contexts. Working with two corpora approaching this goal, we present a linguistic perspective on some of the challenges in creating and extending resources combining language and vision while preserving continuity with the existing best practices in the area of coreference annotation.

[1]  Yan Song,et al.  What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues , 2019, EMNLP.

[2]  José M. F. Moura,et al.  Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Lauri Karttunen,et al.  Discourse Referents , 1969, COLING.

[4]  D. Byron Understanding Referring Expressions in Situated Language Some Challenges for Real-World Agents Donna , 2003 .

[5]  Sandro Pezzelle,et al.  Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts , 2020, EMNLP.

[6]  Eric Fosler-Lussier,et al.  SCARE: a Situated Corpus with Annotated Referring Expressions , 2008, LREC.

[7]  John D. Kelleher,et al.  Local Alignment of Frame of Reference Assignment in English and Swedish Dialogue , 2020, Spatial Cognition.

[8]  Massimo Poesio,et al.  Linguistic and Cognitive Evidence About Anaphora , 2016, Anaphora Resolution - Algorithms, Resources, and Applications.

[9]  Christian Hardmeier,et al.  ParCorFull: a Parallel Corpus Annotated with Full Coreference , 2018, LREC.

[10]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[11]  Sina Zarrieß,et al.  Tell Me More: A Dataset of Visual Scene Description Sequences , 2019, INLG.

[12]  E. Viding,et al.  Load theory of selective attention and cognitive control. , 2004, Journal of experimental psychology. General.

[13]  Anna Nedoluzhko,et al.  Abstract Coreference in a Multilingual Perspective: a View on Czech and German , 2016, CORBON@HLT-NAACL.

[14]  Stefanie Dipper,et al.  Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey , 2018, Computational Linguistics.

[15]  Ron Artstein,et al.  Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus , 2019, Natural Language Engineering.

[16]  Josef van Genabith,et al.  Dynamically structuring, updating and interrelating representations of visual and linguistic discourse context , 2005, Artif. Intell..

[17]  José M. F. Moura,et al.  Visual Coreference Resolution in Visual Dialog using Neural Module Networks , 2018, ECCV.

[18]  Yash Goyal,et al.  Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[20]  Amir Zeldes,et al.  The GUM corpus: creating multilayer resources in the classroom , 2016, Language Resources and Evaluation.

[21]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[22]  Jonathan Krause,et al.  A Hierarchical Approach for Generating Descriptive Image Paragraphs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Mira Ariel Referring and accessibility , 1988, Journal of Linguistics.

[24]  Jesse Thomason,et al.  Vision-and-Dialog Navigation , 2019, CoRL.

[25]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[27]  Sameer Pradhan,et al.  Unrestricted Coreference: Identifying Entities and Events in OntoNotes , 2007, International Conference on Semantic Computing (ICSC 2007).

[28]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[29]  Mira Ariel Accessibility Marking: Discourse Functions, Discourse Profiles, and Processing Cues , 2004, Discourse Processes.