Experience Grounds Language

Successful linguistic communication relies on a shared experience of the world, and it is this shared experience that makes utterances meaningful. Despite the incredible effectiveness of language processing models trained on text alone, today's best systems still make mistakes that arise from a failure to relate language to the physical world it describes and to the social interactions it facilitates. Natural Language Processing is a diverse field, and progress throughout its development has come from new representational theories, modeling techniques, data collection paradigms, and tasks. We posit that the present success of representation learning approaches trained on large text corpora can be deeply enriched from the parallel tradition of research on the contextual and social nature of language. In this article, we consider work on the contextual foundations of language: grounding, embodiment, and social interaction. We describe a brief history and possible progression of how contextual information can factor into our representations, with an eye towards how this integration can move the field forward and where it is currently being pioneered. We believe this framing will serve as a roadmap for truly contextual language understanding.

[1]  L. Auger The Journal of the Acoustical Society of America , 1949 .

[2]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[3]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[4]  L. Wittgenstein The Blue and Brown Books , 1958 .

[5]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[6]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[7]  C. Cairns,et al.  How Children Learn Language. , 1969 .

[8]  A. Samuel,et al.  Whither speech recognition? , 1969, The Journal of the Acoustical Society of America.

[9]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[10]  Terry Winograd,et al.  Understanding natural language , 1974 .

[11]  Susan Ervin-Tripp,et al.  SOME STRATEGIES FOR THE FIRST TWO YEARS , 1973 .

[12]  George Lakoff,et al.  Hedges: A study in meaning criteria and the logic of fuzzy concepts , 1973, J. Philos. Log..

[13]  Eugene Charniak Framed PAINTING: The Representation of a Common Sense Knowledge Fragment , 1977 .

[14]  Eugene Charniak,et al.  Framed PAINTING: The Representation of a Common Sense Knowledge Fragment , 1977, Cogn. Sci..

[15]  Roger C. Schank,et al.  Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .

[16]  D. Premack,et al.  Does the chimpanzee have a theory of mind? , 1978, Behavioral and Brain Sciences.

[17]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[18]  Massimo Piattelli-Palmarini,et al.  Language and Learning: The Debate Between Jean Piaget and Noam Chomsky , 1980 .

[19]  Noam Chomsky,et al.  Lectures on Government and Binding , 1981 .

[20]  Gerald DeJong,et al.  Generalizations Based on Explanations , 1981, IJCAI.

[21]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[22]  J. Sachs,et al.  Language learning with restricted input: Case studies of two hearing children of deaf parents , 1981, Applied Psycholinguistics.

[23]  G. Lakoff,et al.  Metaphors We Live by , 1982 .

[24]  Gerald DeJong,et al.  Learning Schemata for Natural Language Processing , 1985, IJCAI.

[25]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[26]  Susan T. Dumais,et al.  Improving information retrieval using latent semantic indexing , 1988 .

[27]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[28]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[29]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[30]  Geoffrey E. Hinton Preface to the Special Issue on Connectionist Symbol Processing , 1990 .

[31]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[32]  G. M. Werner Evolution of Communication in Artificial Organisms, Artifial Life II , 1991 .

[33]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[34]  J. Holmes,et al.  An introduction to sociolinguistics , 1987 .

[35]  Elinor Ochs Constructing Social Identity: A Language Socialization Perspective , 1993 .

[36]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[37]  Robin I. M. Dunbar Coevolution of neocortical size, group size and language in humans , 1993, Behavioral and Brain Sciences.

[38]  Risto Miikkulainen,et al.  SARDNET: A Self-Organizing Feature Map for Sequences , 1994, NIPS.

[39]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[40]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[41]  D. Lewkowicz,et al.  A dynamic systems approach to the development of cognition and action. , 2007, Journal of cognitive neuroscience.

[42]  Dare A. Baldwin,et al.  Infants' reliance on a social criterion for establishing word-object relations. , 1996, Child development.

[43]  M. Tomasello,et al.  Social cognition, joint attention, and communicative competence from 9 to 15 months of age. , 1998, Monographs of the Society for Research in Child Development.

[44]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[45]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[46]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[47]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[48]  Susan J. Hespos,et al.  Conceptual precursors to language , 2004, Nature.

[49]  Michael Gasser,et al.  The Development of Embodied Cognition: Six Lessons from Babies , 2005, Artificial Life.

[50]  M. Tomasello Constructing a Language , 2005 .

[51]  Siobhan Chapman Logic and Conversation , 2005 .

[52]  Peter Wiemer-Hastings,et al.  Latent semantic analysis , 2004, Annu. Rev. Inf. Sci. Technol..

[53]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[54]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[55]  David DeVault,et al.  Societal Grounding Is Essential to Meaningful Language Use , 2006, AAAI.

[56]  S. Harnad Symbol grounding problem , 1991, Scholarpedia.

[57]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[58]  Andrés Montoyo,et al.  Advances on natural language processing , 2007, Data Knowl. Eng..

[59]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[60]  Raymond J. Mooney,et al.  Learning to Connect Language and Perception , 2008, AAAI.

[61]  L. Barsalou Grounded cognition. , 2008, Annual review of psychology.

[62]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[63]  Mark Steedman,et al.  Last Words: On Becoming a Discipline , 2008, CL.

[64]  C. Frith Social cognition , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[65]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[66]  蒋家义 How to Do Things with Words之脉络分析 , 2009 .

[67]  Csr Young,et al.  How to Do Things With Words , 2009 .

[68]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[69]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[70]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[71]  Stephanie Rosenthal,et al.  An effective personal mobile robot agent through symbiotic human-robot interaction , 2010, AAMAS.

[72]  Yansong Feng,et al.  Topic Models for Image Annotation and Text Illustration , 2010, HLT-NAACL.

[73]  M. Guasti How Children Learn the Meanings of Words , 2010 .

[74]  U. Hasson,et al.  Speaker–listener neural coupling underlies successful communication , 2010, Proceedings of the National Academy of Sciences.

[75]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[76]  Cyrus Rashtchian,et al.  Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[77]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[78]  Kenneth Ward Church,et al.  A Pendulum Swung too Far , 2011 .

[79]  J. Brockmeier,et al.  The Role of Language Games in Children's Understanding of Mental States: A Training Study , 2011 .

[80]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[81]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[82]  Michael C. Frank,et al.  Predicting Pragmatic Reasoning in Language Games , 2012, Science.

[83]  Karl Stratos,et al.  Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.

[84]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[85]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[86]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[87]  Bernt Schiele,et al.  Grounding Action Descriptions in Videos , 2013, TACL.

[88]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.

[89]  M. Engelmann The Philosophical Investigations , 2013 .

[90]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[91]  R. Barr Memory Constraints on Infant Learning From Picture Books, Television, and Touchscreens , 2013 .

[92]  D. Hofstadter,et al.  Surfaces and Essences: Analogy as the Fuel and Fire of Thinking , 2013 .

[93]  Cliff Fitzgerald,et al.  Developing baxter , 2013, 2013 IEEE Conference on Technologies for Practical Robot Applications (TePRA).

[94]  Wanxiang Che,et al.  Learning Semantic Hierarchies via Word Embeddings , 2014, ACL.

[95]  Connor Schenck,et al.  Learning relational object categories using behavioral exploration and multimodal perception , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[96]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[97]  Carina Silberer,et al.  Learning Grounded Meaning Representations with Autoencoders , 2014, ACL.

[98]  G. Zipf Selected Studies of the Principle of Relative Frequency in Language , 2014 .

[99]  G. Vigliocco,et al.  Language as a multimodal phenomenon: implications for language learning, processing and evolution , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[100]  Yunyi Jia,et al.  Back to the Blocks World: Learning New Actions through Situated Human-Robot Dialogue , 2014, SIGDIAL Conference.

[101]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[102]  Ross A. Knepper,et al.  Asking for Help Using Inverse Semantics , 2014, Robotics: Science and Systems.

[103]  Changsong Liu,et al.  Learning to Mediate Perceptual Differences in Situated Human-Robot Dialogue , 2015, AAAI.

[104]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[105]  Yonatan Bisk,et al.  Probing the Linguistic Strengths and Limitations of Unsupervised Grammar Induction , 2015, ACL.

[106]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[107]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[108]  Dan Roth,et al.  Solving Hard Coreference Problems , 2019, NAACL.

[109]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[110]  Roger Levy,et al.  Pragmatic reasoning through semantic inference , 2016, Semantics and Pragmatics.

[111]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[112]  Changsong Liu,et al.  Collaborative Language Grounding Toward Situated Human-Robot Dialogue , 2017, AI Mag..

[113]  Angeliki Lazaridou,et al.  The red one!: On learning to refer to things based on discriminative properties , 2016, ACL.

[114]  Donald Perlis Five Dimensions of Reasoning in the Wild , 2016, AAAI.

[115]  Sandro Pezzelle,et al.  The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[116]  Dan Klein,et al.  Reasoning about Pragmatics with Neural Listeners and Speakers , 2016, EMNLP.

[117]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[118]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[119]  Ali Farhadi,et al.  Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[120]  Peter Stone,et al.  Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy" , 2016, IJCAI.

[121]  Cristian Danescu-Niculescu-Mizil,et al.  Winning Arguments: Interaction Dynamics and Persuasion Strategies in Good-faith Online Discussions , 2016, WWW.

[122]  Ali Farhadi,et al.  "What Happens If..." Learning to Predict the Effect of Forces in Images , 2016, ECCV.

[123]  Alexandre Campeau-Lecours,et al.  Kinova Modular Robot Arms for Service Robotics Applications , 2017, Int. J. Robotics Appl. Technol..

[124]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[125]  Jon Gauthier,et al.  Are Distributional Representations Ready for the Real World? Evaluating Word Vectors for Grounded Perceptual Meaning , 2017, RoboNLP@ACL.

[126]  Joelle Pineau,et al.  A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[127]  Peter Stone,et al.  Opportunistic Active Learning for Grounding Natural Language Descriptions , 2017, CoRL.

[128]  Yann Dauphin,et al.  Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[129]  Wei-Lun Chao,et al.  Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[130]  Desmond Elliott,et al.  Imagination Improves Multimodal Translation , 2017, IJCNLP.

[131]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[132]  Ling-Yi Lin,et al.  Effect of Touch Screen Tablet Use on Fine Motor Development of Young Children , 2017, Physical & occupational therapy in pediatrics.

[133]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[134]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[135]  Shaohua Yang,et al.  Language to Action: Towards Interactive Task Learning with Physical Agents , 2018, IJCAI.

[136]  Nicholas Roy,et al.  Efficient grounding of abstract spatial concepts for natural language interaction with robot platforms , 2018, Int. J. Robotics Res..

[137]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[138]  Thien Huu Nguyen,et al.  BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop , 2018, ArXiv.

[139]  Daniel Marcu,et al.  Learning Interpretable Spatial Operations in a Rich 3D Blocks World , 2017, AAAI.

[140]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[141]  Yoav Goldberg,et al.  Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.

[142]  Mark O. Riedl,et al.  Event Representations for Automated Story Generation with Deep Neural Nets , 2017, AAAI.

[143]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[144]  Cynthia Matuszek,et al.  Grounded Language Learning: Where Robotics and NLP Meet , 2018, IJCAI.

[145]  James R. Glass,et al.  Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, ECCV.

[146]  Matthew J. Hausknecht,et al.  TextWorld: A Learning Environment for Text-based Games , 2018, CGW@IJCAI.

[147]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[148]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[149]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[150]  Ali Farhadi,et al.  IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[151]  Radu Soricut,et al.  Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.

[152]  Derek Chen,et al.  Decoupling Strategy and Generation in Negotiation Dialogues , 2018, EMNLP.

[153]  Stephen Clark,et al.  Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input , 2018, ICLR.

[154]  Nazli Ikizler-Cinbis,et al.  RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes , 2018, EMNLP.

[155]  Thomas L. Griffiths,et al.  Evaluating Theory of Mind in Question Answering , 2018, EMNLP.

[156]  Daniele Moro,et al.  Multimodal Visual and Simulated Muscle Activations for Grounded Semantics of Hand-related Descriptions , 2018 .

[157]  Stefan Lee,et al.  Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[158]  Cordelia Schmid,et al.  Learning Video Representations using Contrastive Bidirectional Transformer , 2019 .

[159]  David Schlangen,et al.  Language Tasks and Language Games: On Methodology in Current Natural Language Processing Research , 2019, ArXiv.

[160]  Cewu Lu,et al.  HAKE: Human Activity Knowledge Engine , 2019, ArXiv.

[161]  Siddhartha S. Srinivasa,et al.  Improving Robot Success Detection using Static Object Data , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[162]  Jesse Thomason,et al.  Vision-and-Dialog Navigation , 2019, CoRL.

[163]  Xin Wang,et al.  VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[164]  Tal Linzen,et al.  Quantity doesn’t buy quality syntax with neural language models , 2019, EMNLP.

[165]  Zhou Yu,et al.  Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good , 2019, ACL.

[166]  Y-Lan Boureau,et al.  Revisiting the Evaluation of Theory of Mind through Question Answering , 2019, EMNLP.

[167]  Yejin Choi,et al.  Social IQA: Commonsense Reasoning about Social Interactions , 2019, EMNLP 2019.

[168]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[169]  Cho-Jui Hsieh,et al.  VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.

[170]  Abhinav Gupta,et al.  PyRobot: An Open-source Robotics Framework for Research and Benchmarking , 2019, ArXiv.

[171]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[172]  Ali Farhadi,et al.  From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[173]  Hinrich Schütze,et al.  Extending Machine Language Models toward Human-Level Language Understanding , 2019, ArXiv.

[174]  Kevin Gimpel,et al.  Visually Grounded Neural Syntax Acquisition , 2019, ACL.

[175]  Lav R. Varshney,et al.  CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[176]  M. Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[177]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[178]  Ross B. Girshick,et al.  PHYRE: A New Benchmark for Physical Reasoning , 2019, NeurIPS.

[179]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[180]  Ruslan Salakhutdinov,et al.  Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.

[181]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[182]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[183]  Y-Lan Boureau,et al.  Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , 2018, ACL.

[184]  David Reitter,et al.  Like a Baby: Visually Situated Neural Language Acquisition , 2018, ACL.

[185]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[186]  Yonatan Bisk,et al.  Shifting the Baseline: Single Modality Performance on Visual Navigation & QA , 2018, NAACL.

[187]  Jason Weston,et al.  Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.

[188]  David Schlangen,et al.  Grounded Agreement Games: Emphasizing Conversational Grounding in Visual Dialogue Settings , 2019, ArXiv.

[189]  Yoav Artzi,et al.  Executing Instructions in Situated Collaborative Interactions , 2019, EMNLP.

[190]  Cordelia Schmid,et al.  VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[191]  Gabriel Ilharco,et al.  Large-Scale Representation Learning from Visually Grounded Untranscribed Speech , 2019, CoNLL.

[192]  Diyi Yang,et al.  Let’s Make Your Request More Persuasive: Modeling Persuasive Strategies via Semi-Supervised Neural Nets on Crowdfunding Platforms , 2019, NAACL.

[193]  Yoav Artzi,et al.  A Corpus for Reasoning about Natural Language Grounded in Photographs , 2018, ACL.

[194]  Michael I. Jordan,et al.  Artificial Intelligence—The Revolution Hasn’t Happened Yet , 2019, Issue 1.

[195]  Abhijit Mahabal,et al.  How Large Are Lions? Inducing Distributions over Quantitative Attributes , 2019, ACL.

[196]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[197]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[198]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[199]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[200]  Louis-Philippe Morency,et al.  Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[201]  Ross A. Knepper,et al.  Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight , 2019, CoRL.

[202]  Cordelia Schmid,et al.  Contrastive Bidirectional Transformer for Temporal Representation Learning , 2019, ArXiv.

[203]  Emily M. Bender,et al.  Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data , 2020, ACL.

[204]  Yejin Choi,et al.  PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[205]  James R. Glass,et al.  Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech , 2019, ICLR.

[206]  S. Kolassa Two Cheers for Rebooting AI: Building Artificial Intelligence We Can Trust , 2020 .

[207]  Alexander Hauptmann,et al.  Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting , 2020, ACL.

[208]  Peter Stone,et al.  Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog , 2020, J. Artif. Intell. Res..

[209]  Jason J. Corso,et al.  Unified Vision-Language Pre-Training for Image Captioning and VQA , 2019, AAAI.

[210]  Harry Shum,et al.  The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.

[211]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[212]  Christopher D. Manning,et al.  Towards Ecologically Valid Research on Language User Interfaces , 2020, ArXiv.

[213]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[214]  Reut Tsarfaty,et al.  Evaluating NLP Models via Contrast Sets , 2020, ArXiv.

[215]  Yejin Choi,et al.  Evaluating Machines by their Real-World Language Use , 2020, ArXiv.

[216]  Zachary Chase Lipton,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2019, ICLR.

[217]  Hadas Kress-Gazit,et al.  Robots That Use Language , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[218]  Ramon Sanabria,et al.  Looking Enhances Listening: Recovering Missing Speech Using Images , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[219]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[220]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[221]  Tal Linzen,et al.  How Can We Accelerate Progress Towards Human-like Linguistic Generalization? , 2020, ACL.

[222]  Eugene Kharitonov,et al.  Compositionality and Generalization In Emergent Languages , 2020, ACL.

[223]  Leonidas J. Guibas,et al.  SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[224]  Luke Zettlemoyer,et al.  ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).