A deep learning approach for understanding natural language commands for mobile service robots

Using natural language to give instructions to robots is challenging, since natural language understanding is still largely an open problem. In this paper we address this problem by restricting our attention to commands modeled as one action, plus arguments (also known as slots). For action detection (also called intent detection) and slot filling various architectures of Recurrent Neural Networks and Long Short Term Memory (LSTM) networks were evaluated, having LSTMs achieved a superior accuracy. As the action requested may not fall within the robots capabilities, a Support Vector Machine(SVM) is used to determine whether it is or not. For the input of the neural networks, several word embedding algorithms were compared. Finally, to implement the system in a robot, a ROS package is created using a SMACH state machine. The proposed system is then evaluated both using well-known datasets and benchmarks in the context of domestic service robots.

[1]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[3]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[4]  Roberto Basili,et al.  A Discriminative Approach to Grounded Spoken Language Understanding in Interactive Robotics , 2016, IJCAI.

[5]  Chin-Hui Lee,et al.  Discriminative training of natural language call routers , 2003, IEEE Trans. Speech Audio Process..

[6]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[7]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Pedro U. Lima,et al.  A Domestic Assistive Robot Developed Through Robot Competitions ∗ , 2016 .

[11]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[12]  Douglas E. Appelt,et al.  GEMINI: A Natural Language System for Spoken-Language Understanding , 1993, ACL.

[13]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[14]  D. Nardi,et al.  Perceptually Informed Spoken Language Understanding for Service Robotics , 2016 .

[15]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[16]  Bob Carpenter,et al.  Vector-based Natural Language Call Routing , 1999, Comput. Linguistics.

[17]  Luca Iocchi,et al.  RoboCup@Home: Scientific Competition and Benchmarking for Domestic Service Robots , 2009 .

[18]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[19]  Luca Iocchi,et al.  Benchmarking Intelligent Service Robots through Scientific Competitions: The RoboCup@Home Approach , 2013, AAAI Spring Symposium: Designing Intelligent Robots.

[20]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[21]  Pedro U. Lima,et al.  RoCKIn and the European Robotics League: Building on RoboCup Best Practices to Promote Robot Competitions in Europe , 2016, RoboCup.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Andreas Stolcke,et al.  Recurrent neural network and LSTM models for lexical utterance classification , 2015, INTERSPEECH.