Counterfactual Credit Assignment in Model-Free Reinforcement Learning
暂无分享,去创建一个
Marcus Hutter | Nicolas Heess | Fabio Viola | Lars Buesing | Arthur Guez | Alaa Saade | Anna Harutyunyan | Will Dabney | Thomas Mesnard | Tom Stepleton | Shantanu Thakoor | R'emi Munos | Th'eophane Weber | N. Heess | A. Guez | R. Munos | Will Dabney | Marcus Hutter | T. Weber | Lars Buesing | A. Harutyunyan | Fabio Viola | Alaa Saade | T. Mesnard | Shantanu Thakoor | T. Stepleton | S. Thakoor
[1] Filipe Wall Mutz,et al. Hindsight policy gradients , 2017, ICLR.
[2] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[3] Jürgen Schmidhuber,et al. World Models , 2018, ArXiv.
[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[5] Dmitry Vetrov,et al. Towards Practical Credit Assignment for Deep Reinforcement Learning , 2021, ArXiv.
[6] Doina Precup,et al. Policy Gradients Incorporating the Future , 2021, ICLR.
[7] Eric Nalisnick,et al. Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..
[8] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[9] Marvin Minsky,et al. Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.
[10] Fabio Viola,et al. Taming VAEs , 2018, ArXiv.
[11] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[12] P. Glasserman,et al. Some Guidelines and Guarantees for Common Random Numbers , 1992 .
[13] Yan Wu,et al. Optimizing agent behavior over long time scales by transporting value , 2018, Nature Communications.
[14] Pieter Abbeel,et al. Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.
[15] Chris Nota,et al. Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods , 2021, ICML.
[16] Doina Precup,et al. Hindsight Credit Assignment , 2019, NeurIPS.
[17] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[18] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Junzhe Zhang,et al. Designing Optimal Dynamic Treatment Regimes: A Causal Reinforcement Learning Approach , 2020, ICML.
[20] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Doina Precup,et al. Value-driven Hindsight Modelling , 2020, NeurIPS.
[23] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[24] Bernhard Schölkopf,et al. Recurrent Independent Mechanisms , 2021, ICLR.
[25] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[26] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.
[27] Tie-Yan Liu,et al. Independence-aware Advantage Estimation , 2019, IJCAI.
[28] Razvan Pascanu,et al. Stabilizing Transformers for Reinforcement Learning , 2019, ICML.
[29] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[30] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[31] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[32] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[33] Jessica B. Hamrick,et al. Analogues of mental simulation and imagination in deep learning , 2019, Current Opinion in Behavioral Sciences.
[34] David Silver,et al. Credit Assignment Techniques in Stochastic Computation Graphs , 2019, AISTATS.
[35] Yoshua Bengio,et al. Gated Feedback Recurrent Neural Networks , 2015, ICML.
[36] David Sontag,et al. Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.
[37] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[38] J. Pearl. Causality: Models, Reasoning and Inference , 2000 .
[39] Mihaela van der Schaar,et al. Estimating Counterfactual Treatment Outcomes over Time Through Adversarially Balanced Representations , 2020, ICLR.
[40] Kenny Young. Variance Reduced Advantage Estimation with δ Hindsight Credit Assignment , 2019, ArXiv.
[41] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[42] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[43] Matthieu Geist,et al. Credit Assignment as a Proxy for Transfer in Reinforcement Learning , 2019, ArXiv.
[44] Hongzi Mao,et al. Variance Reduction for Reinforcement Learning in Input-Driven Environments , 2018, ICLR.
[45] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[46] Nicolas Heess,et al. Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search , 2018, ICLR.
[47] Marc G. Bellemare,et al. Compress and Control , 2015, AAAI.
[48] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[49] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..
[50] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[51] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[52] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[53] T. Weber,et al. Stochastic Gradient Estimation With Finite Differences , 2016 .