论文信息 - Reasoning about Pragmatics with Neural Listeners and Speakers

Reasoning about Pragmatics with Neural Listeners and Speakers

We present a model for pragmatically describing scenes, in which contrastive behavior results from a combination of inference-driven pragmatics and learned semantics. Like previous learned approaches to language generation, our model uses a simple feature-driven architecture (here a pair of neural "listener" and "speaker" models) to ground language in the world. Like inference-driven approaches to pragmatics, our model actively reasons about listener behavior when selecting utterances. For training, our approach requires only ordinary captions, annotated _without_ demonstration of the pragmatic behavior the model ultimately exhibits. In human evaluations on a referring expression game, our approach succeeds 81% of the time, compared to a 69% success rate using existing techniques.

Dan Klein | Jacob Andreas | D. Klein | Jacob Andreas

[1] H. Grice. Logic and conversation , 1975 .

[2] Anne H. Anderson,et al. The Hcrc Map Task Corpus , 1991 .

[3] A. Stolcke,et al. Automatic detection of discourse structure for speech recognition and understanding , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[4] Noah A. Smith,et al. Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[5] Siobhan Chapman. Logic and Conversation , 2005 .

[6] Ielka van der Sluis,et al. Evaluating algorithms for the Generation of Referring Expressions using a balanced corpus , 2007, ENLG.

[7] Luciana Benotti,et al. A computational account of comparative implicatures for a spoken dialogue agent , 2009, IWCS.

[8] Michael C. Frank,et al. Informative communication in word production and word learning , 2009 .

[9] Dan Klein,et al. A Game-Theoretic Approach to Generating Spatial Descriptions , 2010, EMNLP.

[10] C. Lawrence Zitnick,et al. Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Michael C. Frank,et al. Learning and using language via recursive pragmatic reasoning about other agents , 2013, NIPS.

[12] Luke S. Zettlemoyer,et al. Learning Distributions over Logical Forms for Referring Expression Generation , 2013, EMNLP.

[13] Christopher Potts,et al. Emergence of Gricean Maxims from Multi-Agent Decision Theory , 2013, NAACL.

[14] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[15] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[16] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.

[17] Dan Klein,et al. Deep Compositional Question Answering with Neural Module Networks , 2015, ArXiv.

[18] Mirella Lapata,et al. Learning to Interpret and Describe Abstract Scenes , 2015, NAACL.

[19] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[20] Christopher Potts,et al. Learning in the Rational Speech Acts Model , 2015, ArXiv.

[21] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[22] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] C. Lawrence Zitnick,et al. Adopting Abstract Images for Semantic Scene Understanding , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.