Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment
暂无分享,去创建一个
Xing Hu | Zidong Du | Yunji Chen | Xishan Zhang | Rui Zhang | Qi Guo | Shaohui Peng | Jiaming Guo | Qi Yi | Zidong Du | Yunji Chen | Qi Guo | Xing Hu | Rui Zhang | Xishan Zhang | Shaohui Peng | Jiaming Guo | Qiaomin Yi
[1] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[2] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.
[3] Editors , 2003 .
[4] Zhe Gan,et al. CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information , 2020, ICML.
[5] H. J. Mclaughlin,et al. Learn , 2002 .
[6] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[9] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[10] Vladimir Pavlovic,et al. Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach , 2018, IEEE Transactions on Image Processing.
[11] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.
[12] Sergey Levine,et al. Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.
[13] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[16] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[17] Joelle Pineau,et al. Novelty Search in representational space for sample efficient exploration , 2020, NeurIPS.
[18] Hongzi Mao,et al. Variance Reduction for Reinforcement Learning in Input-Driven Environments , 2018, ICLR.
[19] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.