Applying Machine Learning to the Choice of Size Modifiers

People use different size modifiers to refer to different object sizes; “the long table” is likely to be a different table from “the small table”. However, the details influencing the selection of size modifier have not yet been uncovered. When is something “long”, and when is something “small”? We introduce a connection between the visible dimensions of objects and the kinds of language people use to refer to them. First, we conduct an experiment to elicit size-denoting modifiers from images of real world objects. We find that we are able to effectively model the relationship between dimensional features and modifier choice using decision trees. The images are then used as input to an object segmentation algorithm, and we compare how well we can predict speakers’ behavior using the real world measurements of the pictured objects and the image pixel-based measurements. We find that real world measurements are the best predictors of modifier choice, suggesting that people infer real world size features from images. However, automatically extracted pixel measurements do perform relatively well at predicting modifier choice, offering a potential connection between computer vision and natural language. When speaker identity is taken into account, modifier choice can be predicted with even greater accuracy (around 75%), and the difference between automatically extracted and real world measurements is no longer significant.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[3]  Yejin Choi,et al.  Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.

[4]  Srinivas Bangalore,et al.  Trainable Speaker-Based Referring Expression Generation , 2008, CoNLL.

[5]  Julie C. Sedivy,et al.  Pragmatic Versus Form-Based Accounts of Referential Contrast: Evidence for Effects of Informativity Expectations , 2003, Journal of psycholinguistic research.

[6]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[7]  Emiel Krahmer,et al.  Graph-Based Generation of Referring Expressions , 2003, CL.

[8]  M. Tanenhaus,et al.  Watching the eyes when talking about size: An investigation of message formulation and utterance planning , 2006 .

[9]  Ehud Reiter,et al.  Squibs and Discussions: Human Variation and Lexical Choice , 2002, CL.

[10]  Kees van Deemter,et al.  On the Use of Size Modifiers When Referring to Visible Objects , 2011, CogSci.

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  Tamara L. Berg,et al.  Baby Talk : Understanding and Generating Image Descriptions , 2011 .

[13]  Raúl Rojas,et al.  SIOX: simple interactive object extraction in still images , 2005, Seventh IEEE International Symposium on Multimedia (ISM'05).

[14]  Deb K. Roy,et al.  Learning visually grounded words and syntax for a scene description task , 2002, Comput. Speech Lang..

[15]  Kees van Deemter Generating Referring Expressions that Involve Gradable Properties , 2006, CL.

[16]  B. Landau,et al.  “What” and “where” in spatial language and spatial cognition , 1993 .

[17]  Robert Dale,et al.  Speaker-Dependent Variation in Content Selection for Referring Expression Generation , 2010, ALTA.

[18]  Ielka van der Sluis,et al.  Building a Semantically Transparent Corpus for the Generation of Referring Expressions. , 2006, INLG.

[19]  Julie C. Sedivy,et al.  Achieving incremental semantic interpretation through contextual representation , 1999, Cognition.

[20]  Marilyn A. Walker,et al.  Learning Content Selection Rules for Generating Object Descriptions in Dialogue , 2005, J. Artif. Intell. Res..

[21]  Robert Dale,et al.  Referring expression generation: what can we learn from human data? , 2009 .

[22]  Péter Szigetvári,et al.  What and When? , 2019, Inauguration and Liturgical Kingship in the Long Twelfth Century.

[23]  Massimo Poesio,et al.  Annotating a Corpus to Develop and Evaluate Discourse Entity Realization Algorithms: Issues and Preliminary Results , 2000, LREC.