Robot reinforcement learning using EEG-based reward signals

Reinforcement learning algorithms have been successfully applied in robotics to learn how to solve tasks based on reward signals obtained during task execution. These reward signals are usually modeled by the programmer or provided by supervision. However, there are situations in which this reward is hard to encode, and so would require a supervised approach of reinforcement learning, where a user directly types the reward on each trial. This paper proposes to use brain activity recorded by an EEG-based BCI system as reward signals. The idea is to obtain the reward from the activity generated while observing the robot solving the task. This process does not require an explicit model of the reward signal. Moreover, it is possible to capture subjective aspects which are specific to each user. To achieve this, we designed a new protocol to use brain activity related to the correct or wrong execution of the task. We showed that it is possible to detect and classify different levels of error in single trials. We also showed that it is possible to apply reinforcement learning algorithms to learn new similar tasks using the rewards obtained from brain activity.

[1]  José del R. Millán,et al.  Noninvasive brain-actuated control of a mobile robot by human EEG , 2004, IEEE Transactions on Biomedical Engineering.

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3]  Vaughan,et al.  The relationship of brain activity to scalp recordings of event-related potentials. , 1969 .

[4]  K. Brodmann Vergleichende Lokalisationslehre der Großhirnrinde : in ihren Prinzipien dargestellt auf Grund des Zellenbaues , 1985 .

[5]  Javier Minguez,et al.  Human brain-teleoperated robot between remote places , 2009, 2009 IEEE International Conference on Robotics and Automation.

[6]  Oscar Martinez Mozos Semantic place labeling with mobile robots , 2008 .

[7]  N. Birbaumer,et al.  BCI2000: a general-purpose brain-computer interface (BCI) system , 2004, IEEE Transactions on Biomedical Engineering.

[8]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  G. Pfurtscheller,et al.  Prosthetic Control by an EEG-based Brain-Computer Interface (BCI) , 2001 .

[11]  H. Bekkering,et al.  Modulation of activity in medial frontal and motor cortices during error observation , 2004, Nature Neuroscience.

[12]  Stefan Schaal,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[13]  José del R. Millán,et al.  Error-Related EEG Potentials Generated During Simulated Brain–Computer Interaction , 2008, IEEE Transactions on Biomedical Engineering.

[14]  Ricardo Chavarriaga,et al.  To Err Is Human: Learning from Error Potentials in Brain-Computer Interfaces , 2007 .

[15]  Clay B. Holroyd,et al.  Reinforcement-related brain potentials from medial frontal cortex: origins and functional significance , 2004, Neuroscience & Biobehavioral Reviews.

[16]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[17]  J. Hohnsbein,et al.  ERP components on reaction errors and their functional significance: a tutorial , 2000, Biological Psychology.

[18]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  A. Rowan,et al.  Primer of EEG: With A Mini-Atlas , 2003 .

[20]  R D Pascual-Marqui,et al.  Standardized low-resolution brain electromagnetic tomography (sLORETA): technical details. , 2002, Methods and findings in experimental and clinical pharmacology.

[21]  K. R. Ridderinkhof,et al.  Error-related brain potentials are differentially related to awareness of response errors: evidence from an antisaccade task. , 2001, Psychophysiology.

[22]  Todd C. Handy,et al.  Event-related potentials : a methods handbook , 2005 .

[23]  Jelliffe Vergleichende Lokalisationslehre der Grosshirnrinde , 1910 .

[24]  Iñaki Iturrate,et al.  A Noninvasive Brain-Actuated Wheelchair Based on a P300 Neurophysiological Protocol and Automated Navigation , 2009, IEEE Transactions on Robotics.

[25]  J. Millán,et al.  Error-related EEG potentials in brain-computer interfaces , 2007 .