Multi-View Reinforcement Learning

This paper is concerned with multi-view reinforcement learning (MVRL), which allows for decision making when agents share common dynamics but adhere to different observation models. We define the MVRL framework by extending partially observable Markov decision processes (POMDPs) to support more than one observation model and propose two solution methods through observation augmentation and cross-view policy transfer. We empirically evaluate our method and demonstrate its effectiveness in a variety of environments. Specifically, we show reductions in sample complexities and computational time for acquiring policies that handle multi-view environments.

[1]  Katja Hofmann,et al.  A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games , 2016, ICLR 2016.

[2]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[3]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Shiliang Sun,et al.  Multi-view learning overview: Recent progress and new challenges , 2017, Inf. Fusion.

[5]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[6]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[9]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[10]  Thomas B. Schön,et al.  From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[13]  Jürgen Schmidhuber,et al.  Recurrent policy gradients , 2010, Log. J. IGPL.

[14]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[15]  Jürgen Schmidhuber,et al.  On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models , 2015, ArXiv.

[16]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[17]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[18]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[19]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[20]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[21]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[22]  Jun Wang,et al.  Multi-Agent Reinforcement Learning , 2020, Deep Reinforcement Learning.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Jun Chen,et al.  Double-Task Deep Q-Learning with Multiple Views , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[25]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[26]  Xuewen Chen,et al.  An Actor-Critic-Attention Mechanism for Deep Reinforcement Learning in Multi-view Environments , 2019, IJCAI.

[27]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[28]  Ying Wen,et al.  A Regularized Opponent Model with Maximum Entropy Objective , 2019, IJCAI.

[29]  Eric Eaton,et al.  Autonomous Cross-Domain Knowledge Transfer in Lifelong Policy Gradient Reinforcement Learning , 2015, IJCAI.

[30]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[31]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[32]  Jun Wang,et al.  Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning , 2019, WWW.

[33]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[34]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[35]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[36]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[37]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[38]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[39]  Sergey Levine,et al.  Learning to Adapt: Meta-Learning for Model-Based Control , 2018, ArXiv.

[40]  Ming Yang,et al.  A Survey of Multi-View Representation Learning , 2019, IEEE Transactions on Knowledge and Data Engineering.

[41]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[42]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[43]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[45]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.