Henrik Christensen Behavior Adaptation for a Socially Interactive Robot

This report addresses the problem of making a humanoid robot learn a human partner’s preferences regarding personal space and adapt to these in real-time. An adaptive system using policy gradient reinforcement learning (PGRL) is proposed, implemented and evaluated in an experiment using human subjects. The experiment shows that this is a viable solution to the problem, but that there are some issues that remain to be resolved. Beteendeanpassning för en socialt interaktiv robot

[1]  Donald W. Fiske,et al.  Face-to-face interaction: Research, methods, and theory , 1977 .

[2]  Ian R. Fasel,et al.  Face-to-face interactive humanoid robot , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[3]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[4]  Yasushi Nakauchi,et al.  A Social Robot that Stands in Line , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[5]  Tetsuo Ono,et al.  A constructive approach for developing interactive humanoid robots , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7]  E. Sundstrom,et al.  Interpersonal relationships and personal space: Research review and theoretical model , 1976 .

[8]  Tetsuo Ono,et al.  Development and evaluation of an interactive humanoid robot "Robovie" , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[9]  Peter Stone,et al.  A social reinforcement learning agent , 2001, AGENTS '01.

[10]  Brian Scassellati,et al.  A Context-Dependent Attention System for a Social Robot , 1999, IJCAI.

[11]  Chris Drummond,et al.  Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks , 2011, J. Artif. Intell. Res..

[12]  Takayuki Kanda,et al.  Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  T. Ogata,et al.  Dynamic communication of humanoid robot with multiple people based on interaction distance , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[14]  Masayuki Inaba,et al.  Integration model of learning mechanism and dialogue strategy based on stochastic experience representation using Bayesian network , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[15]  Takayuki Kanda,et al.  A practical experiment with interactive humanoid robots in a human society , 2003 .

[16]  Jodi Forlizzi,et al.  All robots are not created equal: the design and perception of humanoid robot heads , 2002, DIS '02.

[17]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[18]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[19]  Douglas Aberdeen,et al.  Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[20]  Maja J. Mataric,et al.  Getting Humanoids to Move and Imitate , 2000, IEEE Intell. Syst..

[21]  Vijay Kumar,et al.  Using policy gradient reinforcement learning on autonomous robot controllers , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[22]  Takayuki Kanda,et al.  Navigation for human-robot interaction tasks , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[23]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[24]  A. Kendon Conducting Interaction: Patterns of Behavior in Focused Encounters , 1990 .

[25]  Illah R. Nourbakhsh,et al.  A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[26]  Wolfram Burgard,et al.  The Interactive Museum Tour-Guide Robot , 1998, AAAI/IAAI.

[27]  Gene F. Franklin,et al.  Feedback Control of Dynamic Systems , 1986 .

[28]  Monica N. Nicolescu,et al.  Learning and interacting in human-robot domains , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[29]  Shigeki Sugano,et al.  Open-End Human Robot Interaction from the Dynamical Systems Perspective: Mutual Adaptation and Incremental Learning , 2004, International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems.

[30]  Hiroaki Kitano,et al.  Learning in psychologically plausible conditions: the case of an imaginary pet robot , 1998 .

[31]  Satoru Hayamizu,et al.  Socially Embedded Learning of the Office-Conversant Mobil Robot Jijo-2 , 1997, IJCAI.

[32]  Tetsuo Ono,et al.  Physical relation and expression: joint attention for human-robot interaction , 2003, IEEE Trans. Ind. Electron..

[33]  E. Hall,et al.  The Hidden Dimension , 1970 .

[34]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[35]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[36]  Ross D. Shachter,et al.  Using background knowledge to speed reinforcement learning in physical agents , 2001, AGENTS '01.

[37]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[38]  Wolfram Burgard,et al.  Using EM to learn motion behaviors of persons with mobile robots , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  Takayuki Kanda,et al.  Adaptation of an Interactive Robot's Behavior Using Policy Gradient Reinforcement Learning , 2005 .