Simultaneous Control and Human Feedback in the Training of a Robotic Agent with Actor-Critic Reinforcement Learning

This paper contributes a preliminary report on the advantages and disadvantages of incorporating simultaneous human control and feedback signals in the training of a reinforcement learning robotic agent. While robotic human-machine interfaces have become increasingly complex in both form and function, control remains challenging for users. This has resulted in an increasing gap between user control approaches and the number of robotic motors which can be controlled. One way to address this gap is to shift some autonomy to the robot. Semi-autonomous actions of the robotic agent can then be shaped by human feedback, simplifying user control. Most prior work on agent shaping by humans has incorporated training with feedback, or has included indirect control signals. By contrast, in this paper we explore how a human can provide concurrent feedback signals and real-time myoelectric control signals to train a robot's actor-critic reinforcement learning control system. Using both a physical and a simulated robotic system, we compare training performance on a simple movement task when reward is derived from the environment, when reward is provided by the human, and combinations of these two approaches. Our results indicate that some benefit can be gained with the inclusion of human generated feedback.

[1]  Panagiotis K. Artemiadis,et al.  EMG-Based Control of a Robot Arm Using Low-Dimensional Embeddings , 2010, IEEE Transactions on Robotics.

[2]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[3]  Patrick M. Pilarski,et al.  Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).

[4]  B Hudgins,et al.  Myoelectric signal processing for control of powered limb prostheses. , 2006, Journal of electromyography and kinesiology : official journal of the International Society of Electrophysiological Kinesiology.

[5]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  TaeChoong Chung,et al.  Learning via human feedback in continuous state and action spaces , 2013, Applied Intelligence.

[8]  Andrea Lockerd Thomaz,et al.  Robot Learning from Human Teachers , 2014, Robot Learning from Human Teachers.

[9]  Philip S. Thomas,et al.  Application of the Actor-Critic Architecture to Functional Electrical Stimulation Control of a Human Arm , 2009, IAAI.

[10]  Peter Stone,et al.  Reinforcement learning from human reward: Discounting in episodic tasks , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[11]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[12]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[13]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[14]  P. Pilarski Prosthetic Devices as Goal-Seeking Agents , 2015 .

[15]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[16]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[17]  Craig Sherstan,et al.  Application of real-time machine learning to myoelectric prosthesis control: A case series in adaptive switching , 2016, Prosthetics and orthotics international.

[18]  Sonia Chernova,et al.  Reinforcement Learning from Demonstration through Shaping , 2015, IJCAI.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  J.W. Sensinger,et al.  Adaptive Pattern Recognition of Myoelectric Signals: Exploration of Conceptual Framework and Practical Algorithms , 2009, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[21]  Huosheng Hu,et al.  Support Vector Machine-Based Classification Scheme for Myoelectric Control Applied to Upper Limb , 2008, IEEE Transactions on Biomedical Engineering.

[22]  Toshiyuki Kondo,et al.  Biological arm motion through reinforcement learning , 2004, Biological Cybernetics.

[23]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[24]  Farbod Fahimi,et al.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[25]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[26]  Peter Stone,et al.  Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance , 2015, Artif. Intell..

[27]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[28]  Tomohiro Shibata,et al.  Policy Gradient Learning of Cooperative Interaction with a Robot Using User's Biological Signals , 2009, ICONIP.

[29]  Patrick M. Pilarski,et al.  Real-time prediction learning for the simultaneous actuation of multiple prosthetic joints , 2013, 2013 IEEE 13th International Conference on Rehabilitation Robotics (ICORR).

[30]  E. Biddiss,et al.  Upper limb prosthesis use and abandonment: A survey of the last 25 years , 2007, Prosthetics and orthotics international.

[31]  J. Stevens,et al.  Animal Intelligence , 1883, Nature.

[32]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[33]  Wenwei Yu,et al.  On‐line learning method for EMG prosthetic hand control , 2001 .

[34]  Kevin B. Englehart,et al.  A robust, real-time control scheme for multifunction myoelectric control , 2003, IEEE Transactions on Biomedical Engineering.

[35]  Tim Brys,et al.  Shaping Mario with Human Advice , 2015, AAMAS.

[36]  Terence D. Sanger,et al.  Neural network learning control of robot manipulators using gradually increasing task difficulty , 1994, IEEE Trans. Robotics Autom..