Curious Meta-Controller: Adaptive Alternation between Model-Based and Model-Free Control in Deep Reinforcement Learning

Recent success in deep reinforcement learning for continuous control has been dominated by model-free approaches which, unlike model-based approaches, do not suffer from representational limitations in making assumptions about the world dynamics and model errors inevitable in complex domains. However, they require a lot of experiences compared to model-based approaches that are typically more sample-efficient. We propose to combine the benefits of the two approaches by presenting an integrated approach called Curious Meta-Controller. Our approach alternates adaptively between model-based and model-free control using a curiosity feedback based on the learning progress of a neural model of the dynamics in a learned latent space. We demonstrate that our approach can significantly improve the sample efficiency and achieve near-optimal performance on learning robotic reaching and grasping tasks from raw-pixel input in both dense and sparse reward settings.

[1]  Filip De Turck,et al.  #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[4]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[5]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[6]  Chu Kiong Loo,et al.  Topological Q-learning with internally guided exploration for mobile robot navigation , 2015, Neural Computing and Applications.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[9]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[10]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[12]  Wouter Kool,et al.  Planning Complexity Registers as a Cost in Metacontrol , 2018, Journal of Cognitive Neuroscience.

[13]  Erik Talvitie,et al.  Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.

[14]  N. Daw Of goals and habits , 2015, Proceedings of the National Academy of Sciences.

[15]  P. Dayan,et al.  Adaptive integration of habits into depth-limited planning defines a habitual-goal–directed spectrum , 2016, Proceedings of the National Academy of Sciences.

[16]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[17]  Victor Uc Cetina,et al.  Reinforcement learning in continuous state and action spaces , 2009 .

[18]  Stefan Wermter,et al.  Slowness-based neural visuomotor control with an Intrinsically motivated Continuous Actor-Critic , 2018, ESANN.

[19]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Pierre-Yves Oudeyer,et al.  Information-seeking, curiosity, and attention: computational and neural mechanisms , 2013, Trends in Cognitive Sciences.

[21]  F. Cushman,et al.  Habitual control of goal selection in humans , 2015, Proceedings of the National Academy of Sciences.

[22]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[23]  Surya P. N. Singh,et al.  V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[25]  Thomas P. Trappenberg,et al.  Mixing Habits and Planning for Multi-Step Target Reaching Using Arbitrated Predictive Actor-Critic , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[26]  Allan Jabri,et al.  Universal Planning Networks , 2018, ICML.

[27]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[28]  Gabriel Kalweit,et al.  Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.

[29]  Stefan Wermter,et al.  NICO — Neuro-inspired companion: A developmental humanoid robot platform for multimodal interaction , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[30]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[31]  Stefan Wermter,et al.  Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning , 2018, Paladyn J. Behav. Robotics.