论文信息 - A Multi-modal Data-Set for Systematic Analyses of Linguistic Ambiguities in Situated Contexts - 字舞流文

A Multi-modal Data-Set for Systematic Analyses of Linguistic Ambiguities in Situated Contexts

Human situated language processing involves the interaction of linguistic and visual processing and this cross-modal integration helps to resolve ambiguities and predict what will be revealed next in an unfolding sentence during spoken communication. However, most state-of-the-art parsing approaches rely solely on the language modality. This paper aims to introduce a multi-modal data-set addressing challenging linguistic structures and visual complexities, which state-of-the-art parsers should be able to deal with. It also briefly addresses the multi-modal parsing approach and a proof-of-concept study that shows the contribution of employing visual information during disambiguation.

Wolfgang Menzel | Özge Alaçam | Tobias Starona | W. Menzel | Özge Alaçam | Tobias Staron

[1] Regina Barzilay,et al. Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees , 2014, ACL.

[2] Patrick McCrae,et al. A computational model for the influence of cross modal context upon syntactic parsing , 2010 .

[3] Patrick McCrae. A Model for the Cross-Modal Influence of Visual Context upon Language Procesing , 2009, RANLP.

[4] F. Ferreira,et al. Language processing in the visual world: Effects of preview, visual complexity, and prediction , 2013 .

[5] Wolfgang Menzel,et al. An architecture for incremental information fusion of cross-modal representations , 2012, 2012 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI).

[6] Jeffrey L. Elman,et al. Activating Verbs from Typical Agents, Patients, Instruments, and Locations via Event Schemas , 2001 .

[7] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[8] P. Knoeferle,et al. The role of visual scenes in spoken language comprehension : evidence from eye-tracking , 2005 .

[9] Wolfgang Menzel,et al. Incorporating Contextual Information for Language-Independent, Dynamic Disambiguation Tasks , 2018, LREC.

[10] Wolfgang Menzel,et al. Multimodal Graph-Based Dependency Parsing of Natural Language , 2016, AISI.

[11] Wolfgang Menzel,et al. Because Size Does Matter: The Hamburg Dependency Treebank , 2014, LREC.

[12] Colin M. Brown,et al. Anticipating upcoming words in discourse: evidence from ERPs and reading times. , 2005, Journal of experimental psychology. Learning, memory, and cognition.

[13] Shimon Ullman,et al. Do You See What I Mean? Visual Resolution of Linguistic Ambiguities , 2015, EMNLP.

[14] Regina Barzilay,et al. Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.

[15] Wolfgang Menzel,et al. Incremental Parsing and the Evaluation of Partial Dependency Analyses , 2011 .

[16] Marshall R. Mayberry,et al. A Connectionist Model of the Coordinated Interplay of Scene, Utterance, and World Knowledge , 2006 .

[17] Francesco Maffioli,et al. The k best spanning arborescences of a network , 1980, Networks.

[18] Keith Hall,et al. K-best Spanning Tree Parsing , 2007, ACL.

[19] Julie C. Sedivy,et al. Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[20] Robert E. Tarjan,et al. Finding optimum branchings , 1977, Networks.

[21] Moreno I. Coco,et al. The interaction of visual and linguistic saliency during syntactic ambiguity resolution , 2015, Quarterly journal of experimental psychology.

[22] Noah A. Smith,et al. Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[23] G. Altmann,et al. Incremental interpretation at verbs: restricting the domain of subsequent reference , 1999, Cognition.