Robust comprehension of natural language instructions by a domestic service robot

Graphical Abstract We present a method through which domestic service robots can comprehend natural language instructions. For each action type, a variety of natural language expressions can be used, for example, the instruction, ‘Go to the kitchen’ can also be expressed as ‘Move to the kitchen.’ We are of the view that natural language instructions are intuitive and, therefore, constitute one of the most user-friendly robot instruction methods. In this paper, we propose a method that enables robots to comprehend instructions spoken by a human user in his/her natural language. The proposed method combines action-type classification, which is based on a support vector machine, and slot extraction, which is based on conditional random fields, both of which are required in order for a robot to execute an action. Further, by considering the co-occurrence relationship between the action type and the slots along with the speech recognition score, the proposed method can avoid degradation of the robot’s comprehension accuracy in noisy environments, where inaccurate speech recognition can be problematic. We conducted experiments using a Japanese instruction data-set collected using a questionnaire-based survey. Experimental results show that the robot’s comprehension accuracy is higher in a noisy environment using our method than when using a baseline method with only a 1-best speech recognition result.

[1]  Maja J. Mataric,et al.  Using semantic fields to model dynamic spatial relations in a robot architecture for natural language instruction of service robots , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[3]  Stefan Wermter,et al.  Towards Robust Speech Recognition for Human-Robot Interaction , 2011 .

[4]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[5]  Manuela M. Veloso,et al.  Learning environmental knowledge from task-based human-robot dialog , 2013, 2013 IEEE International Conference on Robotics and Automation.

[6]  Kevin Lee,et al.  Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..

[7]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[8]  Taku Kudo,et al.  MeCab : Yet Another Part-of-Speech and Morphological Analyzer , 2005 .

[9]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[10]  Roberto Basili,et al.  Kernel-Based Discriminative Re-ranking for Spoken Command Understanding in HRI , 2013, AI*IA.

[11]  Stefanie Tellex,et al.  Grounding Verbs of Motion in Natural Language Commands to Robots , 2010, ISER.

[12]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[13]  Gerhard Lakemeyer,et al.  Natural Language Interpretation for an Interactive Service Robot in Domestic Domains , 2012, ICAART.

[14]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[15]  Tatsuya Kawahara,et al.  Flexible Mixed-Initiative Dialogue Management using Concept-Level Confidence Measures of Speech Recognizer Output , 2000, COLING.

[16]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[17]  Odest Chadwicke Jenkins,et al.  RoboFrameNet: Verb-centric semantics for actions in robot middleware , 2012, 2012 IEEE International Conference on Robotics and Automation.

[18]  Jörg Stückler,et al.  Increasing Flexibility of Mobile Manipulation and Intuitive Human-Robot Interaction in RoboCup@Home , 2013, RoboCup.

[19]  Moritz Tenorth,et al.  Understanding and executing instructions for everyday manipulation tasks from the World Wide Web , 2010, 2010 IEEE International Conference on Robotics and Automation.

[20]  Xiaoping Chen,et al.  KeJia: The Intelligent Domestic Robot for RoboCup@Home 2015 , 2015 .

[21]  Manuela M. Veloso,et al.  Handling Complex Commands as Service Robot Task Requests , 2015, IJCAI.

[22]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[23]  Matthias Scheutz,et al.  Going Beyond Literal Command-Based Instructions: Extending Robotic Natural Language Interaction Capabilities , 2015, AAAI.

[24]  Hermann Ney,et al.  Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Gerhard Lakemeyer,et al.  A Robust Speech Recognition System for Service-Robotics Applications , 2008, RoboCup.

[26]  Peter Stone,et al.  Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.

[27]  Alexander I. Rudnicky,et al.  An empirical investigation of sparse log-linear models for improved dialogue act classification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Dan Klein,et al.  Grounding spatial relations for human-robot interaction , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.