Newtonian Action Advice: Integrating Human Verbal Instruction with Reinforcement Learning

A goal of Interactive Machine Learning (IML) is to enable people without specialized training to teach agents how to perform tasks. Many of the existing machine learning algorithms that learn from human instructions are evaluated using simulated feedback and focus on how quickly the agent learns. While this is valuable information, it ignores important aspects of the human-agent interaction such as frustration. In this paper, we present the Newtonian Action Advice agent, a new method of incorporating human verbal action advice with Reinforcement Learning (RL) in a way that improves the human-agent interaction. In addition to simulations, we validated the Newtonian Action Advice algorithm by conducting a human-subject experiment. The results show that Newtonian Action Advice can perform better than Policy Shaping, a state-of-the-art IML algorithm, both in terms of RL metrics like cumulative reward and human factors metrics like frustration.

[1]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[2]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[3]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[4]  Peter Stone,et al.  A social reinforcement learning agent , 2001, AGENTS '01.

[5]  Smaranda Muresan,et al.  Grounding English Commands to Reward Functions , 2015, Robotics: Science and Systems.

[6]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[7]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[8]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Human Teachers , 2015, IJCAI.

[9]  Jude W. Shavlik,et al.  Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression , 2005, AAAI.

[10]  Andrea Lockerd Thomaz,et al.  Exploration from Demonstration for Interactive Reinforcement Learning , 2016, AAMAS.

[11]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[12]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[13]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[15]  Karen M. Feigh,et al.  Characteristics that Influence Perceived Intelligence in AI Design , 2018, Proceedings of the Human Factors and Ergonomics Society Annual Meeting.

[16]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[17]  Karen M. Feigh,et al.  Interaction Algorithm Effect on Human Experience with Reinforcement Learning , 2018, ACM Transactions on Human-Robot Interaction.

[18]  Manuela M. Veloso,et al.  An interactive approach for situated task specification through verbal instructions , 2014, AAMAS.

[19]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[20]  Gregory Kuhlmann and Peter Stone and Raymond J. Mooney and Shavlik Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer , 2004, AAAI 2004.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Umesh Deshpande,et al.  Object-Oriented Representation and Hierarchical Reinforcement Learning in Infinite Mario , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[23]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[24]  Karen M. Feigh,et al.  Learning From Explanations Using Sentiment and Advice in RL , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[25]  Balaraman Ravindran,et al.  Instructing a Reinforcement Learner , 2012, FLAIRS.

[26]  Peter Stone,et al.  Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.