Reinforcement Learning Approaches in Social Robotics

This article surveys reinforcement learning approaches in social robotics. Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. The scope of the paper is focused particularly on studies that include social physical robots and real-world human-robot interactions with users. We present a thorough analysis of reinforcement learning approaches in social robotics. In addition to a survey, we categorize existent reinforcement learning approaches based on the used method and the design of the reward mechanisms. Moreover, since communication capability is a prominent feature of social robots, we discuss and group the papers based on the communication medium used for reward formulation. Considering the importance of designing the reward function, we also provide a categorization of the papers based on the nature of the reward. This categorization includes three major themes: interactive reinforcement learning, intrinsically motivated methods, and task performance-driven methods. The benefits and challenges of reinforcement learning in social robotics, evaluation methods of the papers regarding whether or not they use subjective and algorithmic measures, a discussion in the view of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the paper. Thus, this paper aims to become a starting point for researchers interested in using and applying reinforcement learning methods in this particular research field.

[1]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[2]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[3]  Kaoru Hirota,et al.  Adapting Multi-Robot Behavior to Communication Atmosphere in Humans-Robots Interaction Using Fuzzy Production Rule Based Friend-Q Learning , 2013, J. Adv. Comput. Intell. Intell. Informatics.

[4]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[5]  Gerhard Sagerer,et al.  Understanding Social Robots , 2009, 2009 Second International Conferences on Advances in Computer-Human Interactions.

[6]  Marzie Zarinbal,et al.  A New Social Robot for Interactive Query-Based Summarization: Scientific Document Summarization , 2019, ICR.

[7]  Elisabeth André,et al.  Adaptive linguistic style for an assistive robotic health companion based on explicit human feedback , 2019, PETRA.

[8]  Brian R. Duffy,et al.  Anthropomorphism and the social robot , 2003, Robotics Auton. Syst..

[9]  C. Breazeal,et al.  Experiments in socially guided exploration: lessons learned in building robots that learn with and without human teachers , 2008, Connect. Sci..

[10]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[11]  Andrea Lockerd Thomaz,et al.  Tutelage and socially guided robot learning , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[12]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[13]  Dimitrios Tzovaras,et al.  A POMDP Design Framework for Decision Making in Assistive Robots , 2017, HCI.

[14]  Yuichiro Yoshikawa,et al.  Robot gains social intelligence through multimodal deep reinforcement learning , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[15]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[16]  Danica Kragic,et al.  A sensorimotor reinforcement learning framework for physical Human-Robot Interaction , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Oliver Lemon,et al.  Hybrid chat and task dialogue for more engaging HRI using reinforcement learning , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[18]  Lili Liu,et al.  Interactive robots as social partner for communication care , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Sonia Chernova,et al.  Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.

[20]  J. Lerner,et al.  Emotion and decision making. , 2015, Annual review of psychology.

[21]  Pierre-Yves Oudeyer,et al.  How can we define intrinsic motivation , 2008 .

[22]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[23]  Roseli A. Francelin Romero,et al.  Relational reinforcement learning applied to shared attention , 2009, 2009 International Joint Conference on Neural Networks.

[24]  Andreas Ernst,et al.  Face Detection with the Sophisticated High-speed Object Recognition Engine (SHORE) , 2011 .

[25]  W. Cannon The Wisdom of the Body , 1932 .

[26]  Kenji Suzuki,et al.  A lesson from subjective computing: autonomous self-referentiality and social interaction as conditions for subjectivity , 2013 .

[27]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[28]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[29]  Hongliang Ren,et al.  Deep Reinforcement Learning for Soft, Flexible Robots: Brief Review with Impending Challenges , 2019, Robotics.

[30]  C. Bartneck,et al.  A design-centred framework for social human-robot interaction , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[31]  Goldie Nejat,et al.  Can I be of assistance? The intelligence behind an assistive robot , 2008, 2008 IEEE International Conference on Robotics and Automation.

[32]  Eric Wiewiora,et al.  Reward Shaping , 2017, Encyclopedia of Machine Learning and Data Mining.

[33]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Alex Mihailidis,et al.  Learning and Personalizing Socially Assistive Robot Behaviors to Aid with Activities of Daily Living , 2018, ACM Transactions on Human-Robot Interaction.

[35]  Nikolaos Mavridis,et al.  A review of verbal and non-verbal human-robot interactive communication , 2014, Robotics Auton. Syst..

[36]  Roseli A. Francelin Romero,et al.  Modelling Shared Attention Through Relational Reinforcement Learning , 2012, J. Intell. Robotic Syst..

[37]  María Malfaz,et al.  A Bio-inspired Motivational Decision Making System for Social Robots Based on the Perception of the User , 2018, Sensors.

[38]  K. Berridge Motivation concepts in behavioral neuroscience , 2004, Physiology & Behavior.

[39]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[40]  Yuichiro Yoshikawa,et al.  Show, attend and interact: Perceivable human-robot social interaction through neural attention Q-network , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Jinhua She,et al.  Information-Driven Multirobot Behavior Adaptation to Emotional Intention in Human–Robot Interaction , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[42]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[43]  Jorge Dias,et al.  αPOMDP: POMDP-based user-adaptive decision-making for social robots , 2019, Pattern Recognit. Lett..

[44]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[45]  Olivier Sigaud,et al.  Policy Search in Continuous Action Domains: an Overview , 2018, Neural Networks.

[46]  Goldie Nejat,et al.  Minimizing task-induced stress in cognitively stimulating activities using an intelligent socially assistive robot , 2011, 2011 RO-MAN.

[47]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[48]  Elisabeth André,et al.  Drink-O-Mender: An Adaptive Robotic Drink Adviser , 2018, MHFI@ICMI.

[49]  Andrea L. Thomaz,et al.  Socially guided machine learning , 2006 .

[50]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[51]  Javier Ruiz-del-Solar,et al.  An Interactive Framework for Learning Continuous Actions Policies Based on Corrective Feedback , 2018, Journal of Intelligent & Robotic Systems.

[52]  Cynthia Breazeal,et al.  Toward sociable robots , 2003, Robotics Auton. Syst..

[53]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[54]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Cynthia Breazeal,et al.  A Model-Free Affective Reinforcement Learning Approach to Personalization of an Autonomous Social Robot Companion for Early Literacy Education , 2019, AAAI.

[56]  Min Wu,et al.  Emotion Regulation Based on Multi-objective Weighted Reinforcement Learning for Human-robot Interaction , 2019, 2019 12th Asian Control Conference (ASCC).

[57]  Cynthia Breazeal,et al.  Affective Personalization of a Social Robot Tutor for Children's Second Language Skills , 2016, AAAI.

[58]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[59]  Li-Chen Fu,et al.  A companion robot for daily care of elders based on homeostasis , 2017, 2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE).

[60]  Fillia Makedon,et al.  Adaptive Robot Assisted Therapy Using Interactive Reinforcement Learning , 2016, ICSR.

[61]  Fillia Makedon,et al.  Task Engagement as Personalization Feedback for Socially-Assistive Robots and Cognitive Training , 2018 .

[62]  Abhijit Gosavi,et al.  Boundedness of iterates in Q-Learning , 2006, Syst. Control. Lett..

[63]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[64]  María Malfaz,et al.  A Biologically Inspired Architecture for an Autonomous and Social Robot , 2011, IEEE Transactions on Autonomous Mental Development.

[65]  Bo He,et al.  Human-Centered Reinforcement Learning: A Survey , 2019, IEEE Transactions on Human-Machine Systems.

[66]  Darwin G. Caldwell,et al.  Reinforcement Learning in Robotics: Applications and Real-World Challenges , 2013, Robotics.

[67]  Peter Stone,et al.  Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance , 2015, Artif. Intell..

[68]  Heriberto Cuayáhuitl,et al.  A Data-Efficient Deep Learning Approach for Deployable Multimodal Social Robots , 2019, Neurocomputing.

[69]  C. Breazeal Role of expressive behaviour for robots that learn from people , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[70]  Yuan Gao,et al.  When Robot Personalisation Does Not Help: Insights from a Robot-Supported Learning Study , 2018, 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[71]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[72]  Radu Horaud,et al.  Neural Network Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction , 2017, Pattern Recognit. Lett..

[73]  Marcelo H. Ang,et al.  A Survey on Perception Methods for Human–Robot Interaction in Social Robots , 2013, International Journal of Social Robotics.

[74]  Takayuki Kanda,et al.  Robot Behavior Adaptation for Human-Robot Interaction based on Policy Gradient Reinforcement Learning , 2005 .

[75]  Huasen Wu,et al.  Double Thompson Sampling for Dueling Bandits , 2016, NIPS.

[76]  Hongming Zhang,et al.  Taxonomy of Reinforcement Learning Algorithms , 2020 .

[77]  Mehdi Khamassi,et al.  Robot Fast Adaptation to Changes in Human Engagement During Simulated Dynamic Social Interaction With Active Exploration in Parameterized Reinforcement Learning , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[78]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[79]  Oliver Lemon,et al.  Machine Learning for Social Multiparty Human--Robot Interaction , 2014, TIIS.

[80]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[81]  María Malfaz,et al.  Learning the Selection of Actions for an Autonomous Social Robot by Reinforcement Learning Based on Motivations , 2011, Int. J. Soc. Robotics.

[82]  María Malfaz,et al.  An Autonomous Social Robot in Fear , 2013, IEEE Transactions on Autonomous Mental Development.

[83]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[84]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[85]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[86]  Andrea Lockerd Thomaz,et al.  Asymmetric Interpretations of Positive and Negative Human Feedback for a Social Learning Agent , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[87]  Brian Scassellati,et al.  Personalized Robot Tutoring Using the Assistive Tutor POMDP (AT-POMDP) , 2019, AAAI.

[88]  Nak Young Chong,et al.  Learning Proxemics for Personalized Human–Robot Social Interaction , 2020, Int. J. Soc. Robotics.

[89]  Takayuki Kanda,et al.  Adapting Robot Behavior for Human--Robot Interaction , 2008, IEEE Transactions on Robotics.

[90]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[91]  Franz Kummert,et al.  Exploring embodiment and dueling bandit learning for preference adaptation in human-robot interaction , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[92]  María Malfaz,et al.  LEARNING BEHAVIORS BY AN AUTONOMOUS SOCIAL ROBOT WITH MOTIVATIONS , 2014, Cybern. Syst..

[93]  Fillia Makedon,et al.  Enhanced therapeutic interactivity using social robot Zeno , 2011, PETRA '11.

[94]  Radu Horaud,et al.  Deep Reinforcement Learning for Audio-Visual Gaze Control , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[95]  Tony Belpaeme,et al.  Why Robots Should Be Social: Enhancing Machine Learning through Social Human-Robot Interaction , 2015, PloS one.

[96]  Ana Paiva,et al.  Modelling Empathy in Social Robotic Companions , 2011, UMAP Workshops.

[97]  Kenji Suzuki,et al.  An Approach to Subjective Computing: A Robot That Learns From Interaction With Humans , 2014, IEEE Transactions on Autonomous Mental Development.

[98]  Cynthia Breazeal,et al.  Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[99]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[100]  Tobias Baur,et al.  Adapting a Robot's linguistic style based on socially-aware reinforcement learning , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[101]  Daniel McDuff,et al.  AFFDEX SDK: A Cross-Platform Real-Time Multi-Face Expression Recognition Toolkit , 2016, CHI Extended Abstracts.

[102]  S. Kopp,et al.  Towards Adaptive Social Behavior Generation for Assistive Robots Using Reinforcement Learning , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[103]  Goldie Nejat,et al.  A learning-based control architecture for an assistive robot providing social engagement during cognitively stimulating activities , 2011, 2011 IEEE International Conference on Robotics and Automation.

[104]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[105]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[106]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[107]  James L. Crowley,et al.  Learning polite behavior with situation models , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[108]  Chu Kiong Loo,et al.  Extensive assessment and evaluation methodologies on assistive social robots for modelling human-robot interaction - A review , 2015, Inf. Sci..

[109]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[110]  Peng Liu,et al.  Guided goal generation for hindsight multi-goal reinforcement learning , 2019, Neurocomputing.

[111]  Catholijn M. Jonker,et al.  Emotion in reinforcement learning agents and robots: a survey , 2017, Machine Learning.

[112]  Peter Stone,et al.  A social reinforcement learning agent , 2001, AGENTS '01.

[113]  Li-Chen Fu,et al.  Active Learning on Service Providing Model: Adjustment of Robot Behaviors Through Human Feedback , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[114]  Adriana Tapus,et al.  User—robot personality matching and assistive robot behavior adaptation for post-stroke rehabilitation therapy , 2008, Intell. Serv. Robotics.

[115]  Stefan Wermter,et al.  Learning Empathy-Driven Emotion Expressions using Affective Modulations , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[116]  Li-Chen Fu,et al.  Interactive Reinforcement Learning based Assistive Robot for the Emotional Support of Children , 2018, 2018 18th International Conference on Control, Automation and Systems (ICCAS).

[117]  Tobias Baur,et al.  The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time , 2013, ACM Multimedia.

[118]  Goldie Nejat,et al.  Social Intelligence for a Robot Engaging People in Cognitive Training Activities , 2012 .

[119]  Guy Hoffman,et al.  Computational Human-Robot Interaction , 2016, Found. Trends Robotics.

[120]  Leopoldina Fortunati,et al.  How many facets does a "social robot" have? A review of scientific and popular definitions online , 2019, Inf. Technol. People.

[121]  María Malfaz,et al.  Bioinspired decision-making for a socially interactive robot , 2019, Cognitive Systems Research.

[122]  Thomas E. Joiner,et al.  A measure of positive and negative affect for children: Scale development and preliminary validation. , 1999 .

[123]  Sonia Chernova,et al.  A Practical Comparison of Three Robot Learning from Demonstration Algorithm , 2012, Int. J. Soc. Robotics.

[124]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[125]  Albrecht Schmidt,et al.  Implicit human computer interaction through context , 2000, Personal Technologies.

[126]  Illah R. Nourbakhsh,et al.  A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[127]  Elisabeth André,et al.  How to Shape the Humor of a Robot - Social Behavior Adaptation Based on Reinforcement Learning , 2018, ICMI.

[128]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[129]  C. Watkins Learning from delayed rewards , 1989 .

[130]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[131]  Sheikh Iqbal Ahamed,et al.  Applying affective feedback to reinforcement learning in ZOEI, a comic humanoid robot , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[132]  Li-Chen Fu,et al.  Personalizing robot behavior for interruption in social human-robot interaction , 2014, 2014 IEEE International Workshop on Advanced Robotics and its Social Impacts.

[133]  Yuichiro Yoshikawa,et al.  Intrinsically motivated reinforcement learning for human-robot interaction in the real-world , 2018, Neural Networks.

[134]  Camelia-Mihaela Pintea,et al.  Towards interactive Machine Learning (iML): Applying Ant Colony Algorithms to Solve the Traveling Salesman Problem with the Human-in-the-Loop Approach , 2016, CD-ARES.

[135]  Lina Yao,et al.  Intent Recognition in Smart Living Through Deep Recurrent Neural Networks , 2017, ICONIP.