Improving Grounded Natural Language Understanding through Human-Robot Dialog

Natural language understanding for robotics can require substantial domainand platform-specific engineering. For example, for mobile robots to pick-and-place objects in an environment to satisfy human commands, we can specify the language humans use to issue such commands, and connect concept words like red can to physical object properties. One way to alleviate this engineering for a new domain is to enable robots in human environments to adapt dynamically— continually learning new language constructions and perceptual concepts. In this work, we present an end-to-end pipeline for translating natural language commands to discrete robot actions, and use clarification dialogs to jointly improve language parsing and concept grounding. We train and evaluate this agent in a virtual setting on Amazon Mechanical Turk, and we transfer the learned agent to a physical robot platform to demonstrate it in the real world.

[1]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.

[2]  Maya Cakmak,et al.  Designing robot learners that ask good questions , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[3]  Shiqi Zhang and Jivko Sinapov and Suhua Wei and Peter Stone,et al.  Robot Behavioral Exploration and Multimodal Perception using POMDPs , 2017 .

[4]  Stevan Harnad The Symbol Grounding Problem , 1999, ArXiv.

[5]  David Wingate,et al.  What Can You Do with a Rock? Affordance Extraction via Word Embeddings , 2017, IJCAI.

[6]  Tomoaki Nakamura,et al.  Mutual learning of an object concept and language model based on MLDA and NPYLM , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[8]  Changsong Liu,et al.  Learning to Mediate Perceptual Differences in Situated Human-Robot Dialogue , 2015, AAAI.

[9]  Roberto Basili,et al.  Textual Inference and Meaning Representation in Human Robot Interaction , 2013, JSSP.

[10]  Rodney D. Nielsen,et al.  Grounding the Meaning of Words through Vision and Interactive Gameplay , 2015, IJCAI.

[11]  Anthony G. Cohn,et al.  Grounding of Human Environments and Activities for Autonomous Robots , 2017, IJCAI.

[12]  Nicholas Roy,et al.  Efficient Grounding of Abstract Spatial Concepts for Natural Language Interaction with Robot Manipulators , 2016, Robotics: Science and Systems.

[13]  Dermot Lynott,et al.  Modality exclusivity norms for 423 object properties , 2009, Behavior research methods.

[14]  Eunsol Choi,et al.  Scaling Semantic Parsers with On-the-Fly Ontology Matching , 2013, EMNLP.

[15]  John E. Laird,et al.  Acquiring Grounded Representations of Words with Situated Interactive Instruction , 2012 .

[16]  Dieter Fox,et al.  Attribute based object identification , 2013, 2013 IEEE International Conference on Robotics and Automation.

[17]  Connor Schenck,et al.  Grounding semantic categories in behavioral interactions: Experiments with 100 objects , 2014, Robotics Auton. Syst..

[18]  Carina Silberer,et al.  Grounded Models of Semantic Representation , 2012, EMNLP.

[19]  Roberto Navigli,et al.  Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction , 2013, CL.

[20]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[21]  Cynthia Matuszek,et al.  Unsupervised Selection of Negative Examples for Grounded Language Learning , 2018, AAAI.

[22]  Nicholas Roy,et al.  Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context , 2017, IJCAI.

[23]  Ross A. Knepper,et al.  Recognizing Unfamiliar Gestures for Human-Robot Interaction Through Zero-Shot Learning , 2016, ISER.

[24]  Gustau Camps-Valls,et al.  Kernel Manifold Alignment for Domain Adaptation , 2015, PloS one.

[25]  George Konidaris,et al.  Generalized 3 D Object Representation using Bayesian Eigenobjects , 2016 .

[26]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Luke S. Zettlemoyer,et al.  Learning from Unscripted Deictic Gesture and Language for Human-Robot Interactions , 2014, AAAI.

[28]  Peter Stone,et al.  Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.

[29]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[30]  Ted Pedersen,et al.  Distinguishing Word Senses in Untagged Text , 1997, EMNLP.

[31]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[32]  Stefan Bordag Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation , 2006, EACL.

[33]  Xiaoping Chen,et al.  Towards an Architecture Combining Grounding and Planning for Human-Robot Interaction , 2015, RoboCup.

[34]  Yejin Choi,et al.  Neural AMR: Sequence-to-Sequence Models for Parsing and Generation , 2017, ACL.

[35]  Ross A. Knepper,et al.  Asking for Help Using Inverse Semantics , 2014, Robotics: Science and Systems.

[36]  Dan Klein,et al.  Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[37]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[38]  Chris Dyer,et al.  Semantic Parsing with Semi-Supervised Sequential Autoencoders , 2016, EMNLP.

[39]  Kobus Barnard,et al.  Word Sense Disambiguation with Pictures , 2003, Artif. Intell..

[40]  Omer Levy,et al.  A Simple Word Embedding Model for Lexical Substitution , 2015, VS@HLT-NAACL.

[41]  Daniel Marcu,et al.  Learning Interpretable Spatial Operations in a Rich 3D Blocks World , 2017, AAAI.

[42]  Raymond J. Mooney,et al.  Integrated Learning of Dialog Strategies and Semantic Parsing , 2017, EACL.

[43]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[44]  Matthias Scheutz,et al.  The Indiana “Cooperative Remote Search Task” (CReST) Corpus , 2010, LREC.

[45]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[46]  C. Lawrence Zitnick,et al.  Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Trevor Darrell,et al.  Understanding object descriptions in robotics by open-vocabulary object retrieval and detection , 2016, Int. J. Robotics Res..

[48]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[49]  Stephen Clark,et al.  Multi- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception , 2015, EMNLP.

[50]  Ingo Lütkebohle,et al.  The curious robot - Structuring interactive robot learning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[51]  Yoav Artzi,et al.  Neural Shift-Reduce CCG Semantic Parsing , 2016, EMNLP.

[52]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[53]  Marc Hanheide,et al.  An integrated system for interactive continuous learning of categorical knowledge , 2016, J. Exp. Theor. Artif. Intell..

[54]  Shaohua Yang,et al.  Physical Causality of Action Verbs in Grounded Language Understanding , 2016, ACL.

[55]  Maxine Eskénazi,et al.  Automated two-way entrainment to improve spoken dialog system performance , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[56]  Changsong Liu,et al.  Probabilistic Labeling for Efficient Referential Grounding based on Collaborative Discourse , 2014, ACL.

[57]  Sinan Kalkan,et al.  Co-learning nouns and adjectives , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[58]  Mark Steedman,et al.  Combinatory Categorial Grammar , 2011 .

[59]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[60]  Nicholas Roy,et al.  Learning Unknown Groundings for Natural Language Interaction with Mobile Robots , 2017, ISRR.

[61]  Trevor Darrell,et al.  Using robotic exploratory procedures to learn the meaning of haptic adjectives , 2013, 2013 IEEE International Conference on Robotics and Automation.

[62]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[63]  Trevor Darrell,et al.  Unsupervised Learning of Visual Sense Models for Polysemous Words , 2008, NIPS.

[64]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[65]  Manali Sharma,et al.  Evidence-based uncertainty sampling for active learning , 2016, Data Mining and Knowledge Discovery.

[66]  Stefan Lee,et al.  Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition , 2018, CoRL.

[67]  Stefanie Tellex,et al.  Learning to Parse Natural Language to Grounded Reward Functions with Weak Supervision , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[68]  Jayant Krishnamurthy,et al.  Open-Vocabulary Semantic Parsing with both Distributional Statistics and Formal Knowledge , 2017, AAAI.

[69]  T. Chartrand,et al.  The Chameleon Effect as Social Glue: Evidence for the Evolutionary Significance of Nonconscious Mimicry , 2003 .

[70]  Angeliki Lazaridou,et al.  Combining Language and Vision with a Multimodal Skip-gram Model , 2015, NAACL.

[71]  Maya Cakmak,et al.  Designing Interactions for Robot Active Learners , 2010, IEEE Transactions on Autonomous Mental Development.

[72]  José M. F. Moura,et al.  VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[74]  Trevor Darrell,et al.  Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Jeffrey Mark Siskind,et al.  Robot Language Learning, Generation, and Comprehension , 2015, ArXiv.

[76]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[77]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[78]  Mario Fritz,et al.  A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.

[79]  James F. Allen,et al.  SALL-E: Situated Agent for Language Learning , 2013, AAAI.

[80]  Yunyi Jia,et al.  Back to the Blocks World: Learning New Actions through Situated Human-Robot Dialogue , 2014, SIGDIAL Conference.

[81]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[82]  Shaohua Yang,et al.  Language to Action: Towards Interactive Task Learning with Physical Agents , 2018, IJCAI.

[83]  Simon Brodeur,et al.  HoME: a Household Multimodal Environment , 2017, ICLR.

[84]  Fangkai Yang,et al.  Planning in Action Language BC while Learning Action Costs for Mobile Robots , 2014, ICAPS.

[85]  Tomoaki Nakamura,et al.  Online Object Categorization Using Multimodal Information Autonomously Acquired by a Mobile Robot , 2012, Adv. Robotics.

[86]  D. Roy Learning Visually Grounded Words and Syntax of Natural Spoken Language , 2000 .

[87]  Matthias Scheutz,et al.  Tell me when and why to do it! Run-time planner model updates via natural language instruction , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[88]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[89]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[90]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[91]  John E. Laird,et al.  Towards an Indexical Model of Situated Language Comprehension for Real-World Cognitive Agents , 2013 .

[92]  Xinlei Chen,et al.  Sense discovery via co-clustering on images and text , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[93]  Matthias Scheutz,et al.  Learning to Recognize Novel Objects in One Shot through Human-Robot Interactions in Natural Language Dialogues , 2014, AAAI.

[94]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[95]  Galia Angelova,et al.  About Sense Disambiguation of Image Tags in Large Annotated Image Collections , 2016 .

[96]  Oliver Lemon,et al.  Incrementally Learning Semantic Attributes through Dialogue Interaction , 2018, AAMAS.

[97]  Dilek Z. Hakkani-Tür,et al.  FollowNet: Robot Navigation by Following Natural Language Directions with Deep Reinforcement Learning , 2018, ArXiv.

[98]  Timothy W. Bickmore,et al.  Increasing Engagement with Virtual Agents Using Automatic Camera Motion , 2016, IVA.

[99]  Jayant Krishnamurthy,et al.  Toward Interactive Grounded Language Acqusition , 2013, Robotics: Science and Systems.

[100]  Jason Weston,et al.  Joint Image and Word Sense Discrimination for Image Retrieval , 2012, ECCV.

[101]  Thomas Serre,et al.  The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[102]  Julia Hirschberg,et al.  Backward mimicry and forward influence in prosodic contour choice in standard American English , 2015, INTERSPEECH.

[103]  Luke S. Zettlemoyer,et al.  Learning Distributions over Logical Forms for Referring Expression Generation , 2013, EMNLP.

[104]  David Whitney,et al.  Interpreting multimodal referring expressions in real time , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[105]  Daniel Marcu,et al.  Natural Language Communication with Robots , 2016, NAACL.

[106]  Martial Hebert,et al.  From Red Wine to Red Tomato: Composition with Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[107]  Anthony G. Cohn,et al.  Natural Language Acquisition and Grounding for Embodied Robotic Systems , 2017, AAAI.

[108]  Stefan Lee,et al.  Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[109]  Peter Stone,et al.  Opportunistic Active Learning for Grounding Natural Language Descriptions , 2017, CoRL.

[110]  Christopher Potts,et al.  Bringing Machine Learning and Compositional Semantics Together , 2015 .

[111]  Tommi S. Jaakkola,et al.  A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[112]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[113]  Kevin Lee,et al.  Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..

[114]  Daniele Nardi,et al.  Teaching Robots Parametrized Executable Plans Through Spoken Interaction , 2015, AAMAS.

[115]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[116]  Yang Gao,et al.  Deep learning for tactile understanding from visual and haptic data , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[117]  Stephen Clark,et al.  Exploiting Image Generality for Lexical Entailment Detection , 2015, ACL.

[118]  Shaogang Gong,et al.  Zero-shot object recognition by semantic manifold distance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[119]  Siobhan Chapman Logic and Conversation , 2005 .

[120]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.

[121]  Yiannis Aloimonos,et al.  A Cognitive System for Understanding Human Manipulation Actions , 2014 .

[122]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[123]  Michael Beetz,et al.  Grounding Robot Plans from Natural Language Instructions with Incomplete World Knowledge , 2018, CoRL.

[124]  Peter Stone,et al.  Multi-modal Predicate Identification using Dynamically Learned Robot Controllers , 2018, IJCAI.

[125]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[126]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, HRI 2010.

[127]  Manuel Lopes,et al.  Active Learning for Teaching a Robot Grounded Relational Symbols , 2013, IJCAI.

[128]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[129]  Jake K. Aggarwal,et al.  BWIBots: A platform for bridging the gap between AI and human–robot interaction research , 2017, Int. J. Robotics Res..

[130]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[131]  Luke S. Zettlemoyer,et al.  LSTM CCG Parsing , 2016, NAACL.

[132]  Daniel Jurafsky,et al.  Eye Spy: Improving Vision through Dialog , 2010, AAAI Fall Symposium: Dialog with Robots.

[133]  Jayant Krishnamurthy,et al.  Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.

[134]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[135]  Carina Silberer,et al.  Visually Grounded Meaning Representations , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[136]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[137]  Ross A. Knepper,et al.  Implicit Communication in a Joint Action , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[138]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[139]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[140]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[141]  Luke S. Zettlemoyer,et al.  Global Neural CCG Parsing with Optimality Guarantees , 2016, EMNLP.

[142]  Gordon Christie,et al.  Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes , 2016, EMNLP.

[143]  Roberto Navigli,et al.  SemEval-2013 Task 11: Word Sense Induction and Disambiguation within an End-User Application , 2013, SemEval@NAACL-HLT.

[144]  Suresh Manandhar,et al.  SemEval-2010 Task 14: Word Sense Induction &Disambiguation , 2010, SemEval@ACL.

[145]  Kais Dukes,et al.  SemEval-2014 Task 6: Supervised Semantic Parsing of Robotic Spatial Commands , 2014, *SEMEVAL.

[146]  Mark Steedman,et al.  Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification , 2010, EMNLP.

[147]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[148]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[149]  Luke S. Zettlemoyer,et al.  A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.

[150]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[151]  Roberto Basili,et al.  A Discriminative Approach to Grounded Spoken Language Understanding in Interactive Robotics , 2016, IJCAI.

[152]  Yejin Choi,et al.  Verb Physics: Relative Physical Knowledge of Actions and Objects , 2017, ACL.

[153]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[154]  Peter Stone,et al.  Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy" , 2016, IJCAI.

[155]  Stefan Lee,et al.  Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[156]  Connor Schenck,et al.  Learning relational object categories using behavioral exploration and multimodal perception , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[157]  Roberto Navigli,et al.  SemEval-2007 Task 10: English Lexical Substitution Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[158]  Jesse Thomason,et al.  Prosodic Entrainment and Tutoring Dialogue Success , 2013, AIED.

[159]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[160]  Robert Babuska,et al.  Teaching robots to imitate a human with no on-teacher sensors. What are the key challenges? , 2019, ArXiv.

[161]  Raymond J. Mooney,et al.  Multi-Modal Word Synset Induction , 2017, IJCAI.

[162]  Raymond J. Mooney,et al.  Improving Black-box Speech Recognition using Semantic Parsing , 2017, IJCNLP 2017.

[163]  Ashwin K. Vijayakumar,et al.  Sound-Word2Vec: Learning Word Representations Grounded in Sounds , 2017, EMNLP.

[164]  Peter Stone,et al.  Learning to Order Objects Using Haptic and Proprioceptive Exploratory Behaviors , 2016, IJCAI.

[165]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[166]  Joohyung Lee,et al.  Action Language BC+: Preliminary Report , 2015, AAAI.

[167]  Xu Wei,et al.  Learning Like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[168]  Manuela M. Veloso,et al.  An interactive approach for situated task specification through verbal instructions , 2014, AAMAS.

[169]  Yi Chang,et al.  Positive-Unlabeled Learning in Streaming Networks , 2016, KDD.

[170]  Thomas Deselaers,et al.  Visual and semantic similarity in ImageNet , 2011, CVPR 2011.

[171]  Peter Stone,et al.  Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions , 2018, AAAI.

[172]  Luke S. Zettlemoyer,et al.  Bootstrapping Semantic Parsers from Conversations , 2011, EMNLP.

[173]  Angeliki Lazaridou,et al.  Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world , 2014, ACL.

[174]  Xin Wang,et al.  Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation , 2018, ECCV.

[175]  Alvin Cheung,et al.  Learning a Neural Semantic Parser from User Feedback , 2017, ACL.

[176]  Pei-hao Su,et al.  Reward estimation for dialogue policy optimisation , 2018, Comput. Speech Lang..

[177]  David Whitney,et al.  Reducing errors in object-fetching interactions through social feedback , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[178]  David A. Forsyth,et al.  Discriminating Image Senses by Clustering with Multimodal Features , 2006, ACL.

[179]  Sinan Kalkan,et al.  The learning of adjectives and nouns from affordance and appearance features , 2013, Adapt. Behav..

[180]  Luc Steels,et al.  Co-Acquisition of Syntax and Semantics - An Investigation in Spatial Language , 2015, IJCAI.