Combining artificial curiosity and tutor guidance for environment exploration

— In a new environment, an artificial agent should explore autonomously and exploit tutoring signals from human caregivers. While these two mechanisms have mainly been studied in isolation, we show in this paper that a carefully designed combination of both performs better than each separately. To this end, we propose an autonomous agent whose actions result from a user-defined weighted combination of two drives: a tendency for gaze-following behaviors in presence of a tutor, and a novelty-based intrinsic curiosity. They are both incorporated in a model-based reinforcement learning framework through reward shaping. The agent is evaluated on a discretized pick-and-place task in order to explore the effects of various combinations of both drives. Results show how a properly tuned combination leads to a faster and more consistent discovery of the task than using each drive in isolation. Additionally, experiments in a reward-free version of the environment indicate that combining curiosity and gaze-following behaviors is a promising path for real-life exploration in artificial agents.

[1]  Friedrich T. Sommer,et al.  Learning and exploration in action-perception loops , 2013, Front. Neural Circuits.

[2]  F. Kaplan,et al.  The challenges of joint attention , 2006 .

[3]  Pierre-Yves Oudeyer,et al.  Simultaneous acquisition of task and feedback models , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[4]  Tony Belpaeme,et al.  Leveraging Human Inputs in Interactive Machine Learning for Human Robot Interaction , 2017, HRI.

[5]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[6]  Pierre-Yves Oudeyer,et al.  Towards hierarchical curiosity-driven exploration of sensorimotor models , 2015, 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[7]  Klaas E. Stephan,et al.  Inferring on the Intentions of Others by Hierarchical Bayesian Learning , 2014, PLoS Comput. Biol..

[8]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[9]  Pierre-Yves Oudeyer,et al.  Autonomous exploration, active learning and human guidance with open-source Poppy humanoid robot platform and Explauto library , 2016, NIPS 2016.

[10]  Pierre-Yves Oudeyer,et al.  Robot learning simultaneously a task and how to interpret human instructions , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[11]  Peter Stone,et al.  Intrinsically motivated model learning for a developing curious agent , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[12]  Andrea Lockerd Thomaz,et al.  Learning object affordances by leveraging the combination of human-guidance and self-exploration , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[13]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Iris Nomikou,et al.  Constructing Interaction: The Development of Gaze Dynamics , 2016 .

[16]  J. Triesch,et al.  A computational model of the emergence of gaze following , 2004 .

[17]  Peter Stone,et al.  TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.

[18]  Frank Broz,et al.  Learning behavior for a social interaction game with a childlike humanoid robot , 2009 .

[19]  Pierre-Yves Oudeyer,et al.  The Playground Experiment: Task-Independent Development of a Curious Robot , 2005 .

[20]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[21]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[22]  Daniel Dewey,et al.  Reinforcement Learning and the Reward Engineering Principle , 2014, AAAI Spring Symposia.

[23]  Sriraam Natarajan,et al.  A Decision-Theoretic Model of Assistance , 2007, IJCAI.

[24]  Mohamed Chetouani,et al.  Social-Task Learning for HRI , 2015, ICSR.