Observational Learning by Reinforcement Learning

Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent. It is a core mechanism appearing in various instances of social learning and has been found to be employed in several intelligent species, including humans. In this paper, we investigate to what extent the explicit modelling of other agents is necessary to achieve observational learning through machine learning. Especially, we argue that observational learning can emerge from pure Reinforcement Learning (RL), potentially coupled with memory. Through simple scenarios, we demonstrate that an RL agent can leverage the information provided by the observations of an other agent performing a task in a shared environment. The other agent is only observed through the effect of its actions on the environment and never explicitly modeled. Two key aspects are borrowed from observational learning: i) the observer behaviour needs to change as a result of viewing a 'teacher' (another agent) and ii) the observer needs to be motivated somehow to engage in making use of the other agent's behaviour. The later is naturally modeled by RL, by correlating the learning agent's reward with the teacher agent's behaviour.

[1]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[2]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[3]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[4]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[5]  Matthieu Geist,et al.  Predicting when to laugh with structured classification , 2014, INTERSPEECH.

[6]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[7]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[8]  Matthieu Geist,et al.  Boosted Bellman Residual Minimization Handling Expert Demonstrations , 2014, ECML/PKDD.

[9]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[10]  Tom Schaul,et al.  Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.

[11]  Alessandro Lazaric,et al.  Direct Policy Iteration with Demonstrations , 2015, IJCAI.

[12]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[13]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[16]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[17]  Matthieu Geist,et al.  Learning from Demonstrations: Is It Worth Estimating a Reward Function? , 2013, ECML/PKDD.

[18]  C. Heyes Imitation, culture and cognition , 1993, Animal Behaviour.

[19]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[20]  Csaba Szepesvári,et al.  Training parsers by inverse reinforcement learning , 2009, Machine Learning.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  A. Bandura,et al.  Social learning and personality development , 1964 .

[23]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[24]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[25]  Joelle Pineau,et al.  Learning from Limited Demonstrations , 2013, NIPS.

[26]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[27]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[28]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[29]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[30]  Matthieu Geist,et al.  Inverse Reinforcement Learning through Structured Classification , 2012, NIPS.

[31]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[32]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[34]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[35]  Fiery Cushman,et al.  Showing versus doing: Teaching by demonstration , 2016, NIPS.

[36]  VelosoManuela,et al.  A survey of robot learning from demonstration , 2009 .

[37]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[38]  A. Bandura Social learning theory , 1977 .

[39]  Matthieu Geist,et al.  Boosted and reward-regularized classification for apprenticeship learning , 2014, AAMAS.