Intrinsic interactive reinforcement learning – Using error-related potentials for real world human-robot interaction

Reinforcement learning (RL) enables robots to learn its optimal behavioral strategy in dynamic environments based on feedback. Explicit human feedback during robot RL is advantageous, since an explicit reward function can be easily adapted. However, it is very demanding and tiresome for a human to continuously and explicitly generate feedback. Therefore, the development of implicit approaches is of high relevance. In this paper, we used an error-related potential (ErrP), an event-related activity in the human electroencephalogram (EEG), as an intrinsically generated implicit feedback (rewards) for RL. Initially we validated our approach with seven subjects in a simulated robot learning scenario. ErrPs were detected online in single trial with a balanced accuracy (bACC) of 91%, which was sufficient to learn to recognize gestures and the correct mapping between human gestures and robot actions in parallel. Finally, we validated our approach in a real robot scenario, in which seven subjects freely chose gestures and the real robot correctly learned the mapping between gestures and actions (ErrP detection (90% bACC)). In this paper, we demonstrated that intrinsically generated EEG-based human feedback in RL can successfully be used to implicitly improve gesture-based robot control during human-robot interaction. We call our approach intrinsic interactive RL.

[1]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[2]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Karim Jerbi,et al.  Exceeding chance level by chance: The caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy , 2015, Journal of Neuroscience Methods.

[5]  Elsa Andrea Kirchner,et al.  Classifier Transferability in the Detection of Error Related Potentials from Observation to Interaction , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[6]  Elsa Andrea Kirchner,et al.  Handling Few Training Data: Classifier Transfer Between Different Types of Error-Related Potentials , 2016, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[7]  Elsa Andrea Kirchner,et al.  Multimodal Movement Prediction - Towards an Individual Assistance of Patients , 2014, PloS one.

[8]  Stephen H. Fairclough,et al.  Embedded multimodal interfaces in robotics: applications, future trends, and societal implications , 2019, The Handbook of Multimodal-Multisensor Interfaces, Volume 3.

[9]  Heloir,et al.  The Uncanny Valley , 2019, The Animation Studies Reader.

[10]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[11]  Ricardo Chavarriaga,et al.  On the Use of Brain Decoded Signals for Online User Adaptive Gesture Recognition Systems , 2010, Pervasive.

[12]  Luis Montesano,et al.  Single trial recognition of error-related potentials during observation of robot operation , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[13]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[14]  Robert Babuska,et al.  Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15]  Guillaume Gibert,et al.  xDAWN Algorithm to Enhance Evoked Potentials: Application to Brain–Computer Interface , 2009, IEEE Transactions on Biomedical Engineering.

[16]  Hendrik Wöhrle,et al.  An Intelligent Man-Machine Interface—Multi-Robot Control Adapted for Task Engagement Based on Single-Trial Detectability of P300 , 2016, Front. Hum. Neurosci..

[17]  Iñaki Iturrate,et al.  Shared-control brain-computer interface for a two dimensional reaching task using EEG error-related potentials , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[18]  Mary-Anne Williams,et al.  Reward from Demonstration in Interactive Reinforcement Learning , 2016, FLAIRS Conference.

[19]  Oliver Kroemer,et al.  Active Reward Learning , 2014, Robotics: Science and Systems.

[20]  Darwin G. Caldwell,et al.  Reinforcement Learning in Robotics: Applications and Real-World Challenges , 2013, Robotics.

[21]  R Chavarriaga,et al.  Learning From EEG Error-Related Potentials in Noninvasive Brain-Computer Interfaces , 2010, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[22]  H. Bekkering,et al.  Modulation of activity in medial frontal and motor cortices during error observation , 2004, Nature Neuroscience.

[23]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[24]  Martin Spüler,et al.  Error-related potentials during continuous feedback: using EEG to detect errors of different type and severity , 2015, Front. Hum. Neurosci..

[25]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[26]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[27]  P. Sajda,et al.  Response error correction-a demonstration of improved human-machine performance using real-time EEG monitoring , 2003, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[28]  C. Braun,et al.  Event-Related Brain Potentials Following Incorrect Feedback in a Time-Estimation Task: Evidence for a Generic Neural System for Error Detection , 1997, Journal of Cognitive Neuroscience.

[29]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[30]  Hendrik Wöhrle,et al.  A Hybrid FPGA-Based System for EEG- and EMG-Based Online Movement Prediction , 2017, Sensors.

[31]  Klaus Gramann,et al.  Neuroadaptive technology enables implicit cursor control based on medial prefrontal cortex activity , 2016, Proceedings of the National Academy of Sciences.

[32]  M. Fahle,et al.  On the Applicability of Brain Reading for Predictive Human-Machine Interfaces in Robotics , 2013, PloS one.

[33]  Iñaki Iturrate,et al.  Robot reinforcement learning using EEG-based reward signals , 2010, 2010 IEEE International Conference on Robotics and Automation.

[34]  Ricardo Chavarriaga,et al.  Teaching brain-machine interfaces as an alternative paradigm to neuroprosthetics control , 2015, Scientific Reports.

[35]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[36]  Joseph DelPreto,et al.  Correcting robot mistakes in real time using EEG signals , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[38]  Ricardo Chavarriaga,et al.  Adaptation of hybrid human-computer interaction systems using EEG error-related potentials , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[39]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40]  Hendrik Wöhrle,et al.  Online Classifier Adaptation for the Detection of P300 Target Recognition Processes in a Complex Teleoperation Scenario , 2014, PhyCS.

[41]  Frank Kirchner,et al.  Intuitive Interaction with Robots - Technical Approaches and Challenges , 2015, SyDe Summer School.

[42]  H. Ishiguro,et al.  The thing that should not be: predictive coding and the uncanny valley in perceiving human and humanoid robot actions , 2011, Social cognitive and affective neuroscience.

[43]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[44]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[45]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[46]  Ricardo Chavarriaga,et al.  Errare machinale est: the use of error-related potentials in brain-machine interfaces , 2014, Front. Neurosci..

[47]  José del R. Millán,et al.  Error-Related EEG Potentials Generated During Simulated Brain–Computer Interaction , 2008, IEEE Transactions on Biomedical Engineering.

[48]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[49]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[50]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[51]  Ricardo Chavarriaga,et al.  Robust, accurate spelling based on error-related potentials , 2016 .

[52]  Chris Stahlhut,et al.  Interaction in reinforcement learning reduces the need for finely tuned hyperparameters in complex tasks , 2016 .

[53]  Clay B. Holroyd,et al.  The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.

[54]  Mario Michael Krell,et al.  pySPACE—a signal processing and classification environment in Python , 2013, Front. Neuroinform..

[55]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[56]  J. Hohnsbein,et al.  ERP components on reaction errors and their functional significance: a tutorial , 2000, Biological Psychology.

[57]  Rolf Drechsler,et al.  A formal model for embedded brain reading , 2013, Ind. Robot.

[58]  Cynthia Breazeal,et al.  Real-Time Interactive Reinforcement Learning for Robots , 2005 .