CLIC: Curriculum Learning and Imitation for Object Control in Nonrewarding Environments

In this paper we study a new reinforcement learning setting where the environment is non-rewarding, contains several possibly related objects of various controllability, where an apt agent Bob acts following its own goals, without necessarily providing helpful demonstrations, and where the objective of an agent is to learn to control objects individually. We present a generic discrete-state discrete-action model of such environments, and an unsupervised reinforcement learning agent called CLIC for Curriculum Learning and Imitation for Control to achieve the desired objective. CLIC selects objects to focus on when training and imitating by maximizing its learning progress. We show that CLIC can effectively observe Bob to gain control of objects faster, even if Bob is not explicitly teaching. Despite choosing what it imitates in a principled way, CLIC retains the natural ability to follow Bob when he provides ordered demonstrations. Finally, we show that compared with a non-curriculum based agent, when Bob controls objects that the agent cannot, or in presence of a hierarchy between objects in the environment, CLIC achieves faster mastery of the environment by ignoring non-reproducible and already mastered interactions with objects when imitating.

[1]  D. Weinshall,et al.  Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks , 2018, ICML.

[2]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[3]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[4]  Satinder Singh,et al.  Many-Goals Reinforcement Learning , 2018, ArXiv.

[5]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[6]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[7]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[8]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[9]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[10]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[11]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[12]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning , 2018, ICML.

[13]  Pieter Abbeel,et al.  Learning for control from multiple demonstrations , 2008, ICML '08.

[14]  Xi Chen,et al.  Learning From Demonstration in the Wild , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[15]  Joelle Pineau,et al.  Independently Controllable Features , 2017 .

[16]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[17]  Sonia Chernova,et al.  Integrating reinforcement learning with human demonstrations of varying ability , 2011, AAMAS.

[18]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[19]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[20]  Philippe Beaudoin,et al.  Independently Controllable Factors , 2017, ArXiv.

[21]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[22]  Kee-Eung Kim,et al.  Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions , 2012, NIPS.

[23]  Rajesh P. N. Rao,et al.  Active Imitation Learning , 2007, AAAI.

[24]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning , 2018, ICML 2019.

[25]  Marlos C. Machado,et al.  Learning Purposeful Behaviour in the Absence of Rewards , 2016, ArXiv.

[26]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[27]  Tom Schaul,et al.  Unicorn: Continual Learning with a Universal, Off-policy Agent , 2018, ArXiv.

[28]  Jiashi Feng,et al.  Policy Optimization with Demonstrations , 2018, ICML.

[29]  Gaurav S. Sukhatme,et al.  Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets , 2017, NIPS.

[30]  Peter Stone,et al.  Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[31]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[32]  Tony Charman,et al.  Gradations of emulation learning in infants' imitation of actions on objects. , 2005, Journal of experimental child psychology.

[33]  Thomas Cederborg,et al.  Artificial learners adopting normative conventions from human teachers , 2017, Paladyn J. Behav. Robotics.

[34]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[35]  Sergey Levine,et al.  Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.

[36]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[37]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[38]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[39]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[40]  Pierre-Yves Oudeyer,et al.  Intrinsically motivated goal exploration for active motor learning in robots: A case study , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Dominique Duhaut,et al.  Learning a Set of Interrelated Tasks by Using a Succession of Motor Policies for a Socially Guided Intrinsically Motivated Learner , 2018, Front. Neurorobot..

[42]  Pierre-Yves Oudeyer,et al.  Bootstrapping Intrinsically Motivated Learning with Human Demonstrations , 2011, ArXiv.

[43]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[44]  H. Ruff Infants' Manipulative Exploration of Objects: Effects of Age and Object Characteristics. , 1984 .

[45]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[46]  Michael L. Littman,et al.  Apprenticeship Learning About Multiple Intentions , 2011, ICML.

[47]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[48]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[49]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[50]  Andrew G. Barto,et al.  Competence progress intrinsic motivation , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[51]  Mohamed Chetouani,et al.  Training a robot with evaluative feedback and unlabeled guidance signals , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[52]  Ran,et al.  The correspondence problem , 1998 .

[53]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..