Grounded language interpretation of robotic commands through structured learning

Abstract The presence of robots in everyday life is increasing day by day at a growing pace. Industrial and working environments, health-care assistance in public or domestic areas can benefit from robots' services to accomplish manifold tasks that are difficult and annoying for humans. In such scenarios, Natural Language interactions, enabling collaboration and robot control, are meant to be situated, in the sense that both the user and the robot access and make reference to the environment. Contextual knowledge may thus play a key role in solving inherent ambiguities of grounded language as, for example, the prepositional phrase attachment. In this work, we present a linguistic pipeline for semantic processing of robotic commands, that combines discriminative structured learning, distributional semantics and contextual evidence extracted from the working environment. The final goal is to make the interpretation process of linguistic exchanges depending on physical, cognitive and language-dependent aspects. We present, formalize and discuss an adaptive Spoken Language Understanding chain for robotic commands, that explicitly depends on the operational context during both the learning and processing stages. The resulting framework allows to model heterogeneous information concerning the environment (e.g., positional information about the objects and their properties) and to inject it in the learning process. Empirical results demonstrate a significant contribution of such additional dimensions, achieving up to a 25% of relative error reduction with respect to a pipeline that only exploits linguistic evidence.

[1]  Roberto Basili,et al.  Effective and Robust Natural Language Understanding for Human-Robot Interaction , 2014, ECAI.

[2]  Joachim Hertzberg,et al.  Towards semantic maps for mobile robots , 2008, Robotics Auton. Syst..

[3]  Daniele Nardi,et al.  Living with robots: Interactive environmental knowledge acquisition , 2016, Robotics Auton. Syst..

[4]  David I. Beaver,et al.  The puzzle of ambiguity , 2005 .

[5]  Shimon Ullman,et al.  Do You See What I Mean? Visual Resolution of Linguistic Ambiguities , 2015, EMNLP.

[6]  Gordon Christie,et al.  Resolving vision and language ambiguities together: Joint segmentation & prepositional attachment resolution in captioned scenes , 2017, Comput. Vis. Image Underst..

[7]  Takako Aikawa,et al.  Learning Prepositional Attachment From Sentence Aligned Bilingual Corpora , 2003 .

[8]  Matthew R. Walter,et al.  Approaching the Symbol Grounding Problem with Probabilistic Graphical Models , 2011, AI Mag..

[9]  Ruslan Salakhutdinov,et al.  Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[10]  Lindsay Kleeman,et al.  Interactive SLAM using Laser and Advanced Sonar , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[11]  Ryo Kurazume,et al.  Categorization of Indoor Places Using the Kinect Sensor , 2012, Sensors.

[12]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[13]  James M. Rehg,et al.  Visual Place Categorization: Problem, dataset, and algorithm , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Roberto Basili,et al.  KeLP: a Kernel-based Learning Platform for Natural Language Processing , 2015, ACL.

[15]  Cipriano Galindo,et al.  Multi-hierarchical semantic maps for mobile robotics , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Frédéric Kaplan,et al.  Talking AIBO : First Experimentation of Verbal Interactions with an Autonomous Four-legged Robot , 2000 .

[17]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[18]  John E. Laird,et al.  Grounding Language for Interactive Task Learning , 2017, RoboNLP@ACL.

[19]  Roberto Basili,et al.  Structured learning for spoken language understanding in human-robot interaction , 2017, Int. J. Robotics Res..

[20]  Alessandro Saffiotti,et al.  A virtual sensor for room detection , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Roberto Basili,et al.  Using Semantic Models for Robust Natural Language Human Robot Interaction , 2015, AI*IA.

[22]  Marc Hanheide,et al.  A system for interactive learning in dialogue with a tutor , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Odest Chadwicke Jenkins,et al.  RoboFrameNet: Verb-centric semantics for actions in robot middleware , 2012, 2012 IEEE International Conference on Robotics and Automation.

[24]  Sanja Fidler,et al.  What Are You Talking About? Text-to-Image Coreference , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Roberto Basili,et al.  HuRIC: a Human Robot Interaction Corpus , 2014, LREC.

[26]  Roberto Basili,et al.  A Discriminative Approach to Grounded Spoken Language Understanding in Interactive Robotics , 2016, IJCAI.

[27]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[28]  Kenneth Ward Church,et al.  Coping with Syntactic Ambiguity or How to Put the Block in the Box on the Table , 1982, CL.

[29]  Changsong Liu,et al.  Grounded Semantic Role Labeling , 2016, NAACL.

[30]  Daniele Nardi,et al.  A proposal for semantic map representation and evaluation , 2015, 2015 European Conference on Mobile Robots (ECMR).

[31]  Dejan Pangercic,et al.  Semantic Object Maps for robotic housework - representation, acquisition and use , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Elena Cabrio,et al.  Frame Instance Extraction and Clustering for Default Knowledge Building , 2017, AnSWeR@ESWC.

[33]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[34]  Frank Keller,et al.  Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings , 2016, NAACL.

[35]  Peter Stone,et al.  Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.

[36]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[37]  Charles J. Fillmore,et al.  Frames and the semantics of understanding , 1985 .

[38]  Shaohua Yang,et al.  Physical Causality of Action Verbs in Grounded Language Understanding , 2016, ACL.

[39]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[40]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[41]  Luke S. Zettlemoyer,et al.  A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.

[42]  Oliver Lemon,et al.  Learning how to Learn: An Adaptive Dialogue Agent for Incrementally Learning Visually Grounded Word Meanings , 2017, RoboNLP@ACL.

[43]  Jayant Krishnamurthy,et al.  Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.

[44]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.