Information Structure Prediction for Visual-world Referring Expressions

We investigate the order of mention for objects in relational descriptions in visual scenes. Existing work in the visual domain focuses on content selection for text generation and relies primarily on templates to generate surface realizations from underlying content choices. In contrast, we seek to clarify the influence of visual perception on the linguistic form (as opposed to the content) of descriptions, modeling the variation in and constraints on the surface orderings in a description. We find previously-unknown effects of the visual characteristics of objects; specifically, when a relational description involves a visually salient object, that object is more likely to be mentioned first. We conduct a detailed analysis of these patterns using logistic regression, and also train and evaluate a classifier. Our methods yield significant improvement in classification accuracy over a naive baseline.

[1]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[2]  Michael White,et al.  Minimal Dependency Length in Realization Ranking , 2012, EMNLP.

[3]  Albert Gatt,et al.  Intrinsic vs. Extrinsic Evaluation Measures for Referring Expression Generation , 2008, ACL.

[4]  Michael J. Crawley,et al.  The R book , 2022 .

[5]  Ellen F. Prince,et al.  Toward a taxonomy of given-new information , 1981 .

[6]  U. Neisser VISUAL SEARCH. , 1964, Scientific American.

[7]  Emiel Krahmer,et al.  Introducing shared task evaluation to NLG : The TUNA shared task evaluation challenges , 2010 .

[8]  Dan Klein,et al.  A Game-Theoretic Approach to Generating Spatial Descriptions , 2010, EMNLP.

[9]  Robbert-Jan Beun,et al.  Object reference in a shared domain of conversation , 1998 .

[10]  Stephan Winter,et al.  Including landmarks in routing instructions , 2010, J. Locat. Based Serv..

[11]  D. Olson,et al.  The elaboration of the noun phrase in children's description of objects , 1975 .

[12]  Dare A. Baldwin,et al.  Understanding the link between joint attention and language. , 1995 .

[13]  Ellen Riloff,et al.  Corpus-Based Identification of Non-Anaphoric Noun Phrases , 1999, ACL.

[14]  T. Pechmann Incremental speech production and referential overspecification , 1989 .

[15]  R. Harald Baayen,et al.  Predicting the dative alternation , 2007 .

[16]  Albert Gatt,et al.  The TUNA-REG Challenge 2009: Overview and Evaluation Results , 2009, ENLG.

[17]  Emiel Krahmer,et al.  Does domain size impact speech onset time during reference production? , 2012, CogSci.

[18]  Michael Strube,et al.  Generating Constituent Order in German Clauses , 2007, ACL.

[19]  Martin Handford Where's Wally Now? , 1988 .

[20]  Changsong Liu,et al.  Towards Situated Dialogue: Revisiting Referring Expression Generation , 2013, EMNLP.

[21]  Michael White,et al.  Generating with Discourse Combinatory Categorial Grammar , 2010 .

[22]  Claudia Maienborn,et al.  On the Position and Interpretation of Locative Modifiers , 2001 .

[23]  C. Moore,et al.  Joint attention : its origins and role in development , 1995 .

[24]  Betty J. Birner,et al.  Definiteness and the English Existential , 1995 .

[25]  Robert Dale,et al.  Generating Referring Expressions Involving Relations , 1991, EACL.

[26]  Deb Roy,et al.  Connecting language to the world , 2005, Artif. Intell..

[27]  Philip R. Cohen,et al.  Referring as a Collaborative Process , 2003 .

[28]  Srinivas Bangalore,et al.  Referring Expression Generation Using Speaker-based Attribute Selection and Trainable Realization (ATTR) , 2008, INLG.

[29]  John D. Kelleher,et al.  Incremental Generation of Spatial Referring Expressions in Situated Dialog , 2006, ACL.

[30]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[31]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[32]  Marie-Catherine de Marneffe,et al.  Visual and linguistic predictors for the definiteness of referring expressions , 2013 .

[33]  M. Tomasello,et al.  Social cognition, joint attention, and communicative competence from 9 to 15 months of age. , 1998, Monographs of the Society for Research in Child Development.

[34]  Gregory Ward,et al.  Discourse and Information Structure , 2005 .

[35]  Alexander Toet,et al.  Computational versus Psychophysical Bottom-Up Image Saliency: A Comparative Evaluation Study , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[37]  Josef van Genabith,et al.  Dynamically structuring, updating and interrelating representations of visual and linguistic discourse context , 2005, Artif. Intell..

[38]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[39]  M. Elsner,et al.  Where's Wally: the influence of visual salience on referring expression generation , 2013, Front. Psychol..

[40]  Emiel Krahmer,et al.  Computational Generation of Referring Expressions: A Survey , 2012, CL.

[41]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[42]  Emiel Krahmer,et al.  Realizing the Costs: Template-Based Surface Realisation in the GRAPH Approach to Referring Expression Generation , 2009, ENLG.

[43]  Bonnie L. Webber,et al.  D-LTAG: extending lexicalized TAG to discourse , 2004, Cogn. Sci..