Learning to Generate Compositional Color Descriptions

The production of color language is essential for grounded language generation. Color descriptions have many challenging properties: they can be vague, compositionally complex, and denotationally rich. We present an effective approach to generating color descriptions using recurrent neural networks and a Fourier-transformed color representation. Our model outperforms previous work on a conditional language modeling task over a large corpus of naturalistic color descriptions. In addition, probing the model's output reveals that it can accurately produce not only basic color terms but also descriptors with non-convex denotations ("greenish"), bare modifiers ("bright", "dull"), and compositional phrases ("faded teal") not seen in training.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  H. Akaike A new look at the statistical model identification , 1974 .

[3]  Tamara L. Berg,et al.  Baby Talk : Understanding and Generating Image Descriptions , 2011 .

[4]  Colin Raffel,et al.  Lasagne: First release. , 2015 .

[5]  Guojun Lu,et al.  Shape-based image retrieval using generic Fourier descriptor , 2002, Signal Process. Image Commun..

[6]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[7]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[8]  Noah A. Smith,et al.  Character Sequence Models for Colorful Words , 2016, EMNLP.

[9]  Brian McMahan,et al.  A Bayesian Model of Grounded Color Semantics , 2015, TACL.

[10]  Emiel Krahmer,et al.  Computational Generation of Referring Expressions: A Survey , 2012, CL.

[11]  P. Kay Basic Color Terms: Their Universality and Evolution , 1969 .

[12]  Dan Klein,et al.  A Game-Theoretic Approach to Generating Spatial Descriptions , 2010, EMNLP.

[13]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[14]  Christopher Potts,et al.  Learning in the Rational Speech Acts Model , 2015, ArXiv.

[15]  Karl Stratos,et al.  Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.

[16]  David DeVault,et al.  Managing ambiguities across utterances in dialogue , 2007 .

[17]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[18]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.