Learning Dynamics and Generalization in Deep Reinforcement Learning
暂无分享,去创建一个
[1] Pulkit Agrawal,et al. Overcoming the Spectral Bias of Neural Value Approximation , 2022, ICLR.
[2] Pierre-Luc Bacon,et al. The Primacy Bias in Deep Reinforcement Learning , 2022, ICML.
[3] Sergey Levine,et al. DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization , 2021, ICLR.
[4] A Survey of Generalisation in Deep Reinforcement Learning , 2021, ArXiv.
[5] Georg Ostrovski,et al. The Difficulty of Passive Learning in Deep Reinforcement Learning , 2021, NeurIPS.
[6] Francis Bach,et al. Batch Normalization Orthogonalizes Representations in Deep Random Networks , 2021, NeurIPS.
[7] Clare Lyle,et al. On The Effect of Auxiliary Tasks on Representation Dynamics , 2021, AISTATS.
[8] Rob Fergus,et al. Decoupling Value and Policy for Generalization in Reinforcement Learning , 2021, ICML.
[9] Soham De,et al. On the Origin of Implicit Regularization in Stochastic Gradient Descent , 2021, ICLR.
[10] Xiaolong Wang,et al. Generalization in Reinforcement Learning by Soft Data Augmentation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).
[11] D. Barrett,et al. Implicit Gradient Regularization , 2020, ICLR.
[12] Jeannette Bohg,et al. GRAC: Self-Guided and Self-Regularized Actor-Critic , 2020, CoRL.
[13] John Schulman,et al. Phasic Policy Gradient , 2020, ICML.
[14] R. Fergus,et al. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.
[15] Jiashi Feng,et al. Improving Generalization in Reinforcement Learning with Mixture Regularization , 2020, NeurIPS.
[16] Jaehoon Lee,et al. Finite Versus Infinite Neural Networks: an Empirical Study , 2020, NeurIPS.
[17] Marc G. Bellemare,et al. Representations for Stable Off-Policy Reinforcement Learning , 2020, ICML.
[18] Vincent Liu,et al. Towards a practical measure of interference for reinforcement learning , 2020, ArXiv.
[19] P. Abbeel,et al. Reinforcement Learning with Augmented Data , 2020, NeurIPS.
[20] David Tse,et al. A Fourier-Based Approach to Generalization and Optimization in Deep Learning , 2020, IEEE Journal on Selected Areas in Information Theory.
[21] Adam White,et al. Improving Performance in Reinforcement Learning by Breaking Generalization in Neural Networks , 2020, AAMAS.
[22] Joelle Pineau,et al. Interference and Generalization in Temporal Difference Learning , 2020, ICML.
[23] Doina Precup,et al. Invariant Causal Prediction for Block MDPs , 2020, ICML.
[24] Gregory W. Benton,et al. Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited , 2020, ArXiv.
[25] Xingyou Song,et al. Observational Overfitting in Reinforcement Learning , 2019, ICLR.
[26] J. Schulman,et al. Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.
[27] Sina Ghiassian,et al. Overcoming Catastrophic Interference in Online Reinforcement Learning with Dynamic Self-Organizing Maps , 2019, ArXiv.
[28] Sam Devlin,et al. Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck , 2019, NeurIPS.
[29] Yarin Gal,et al. Generalizing from a few environments in safety-critical reinforcement learning , 2019, ArXiv.
[30] Fred Zhang,et al. SGD on Neural Networks Learns Functions of Increasing Complexity , 2019, NeurIPS.
[31] Razvan Pascanu,et al. Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.
[32] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[33] Razvan Pascanu,et al. Distilling Policy Distillation , 2019, AISTATS.
[34] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.
[35] Chico Q. Camargo,et al. Deep learning generalizes because the parameter-function map is biased towards simple functions , 2018, ICLR.
[36] Marlos C. Machado,et al. Generalization and Regularization in DQN , 2018, ArXiv.
[37] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[38] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
[39] Samy Bengio,et al. A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.
[40] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.
[41] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[42] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[43] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.
[44] Kimberly L. Stachenfeld,et al. The hippocampus as a predictive map , 2017, Nature Neuroscience.
[45] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[46] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.
[47] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[48] Tamara Tosic,et al. Graph-based regularization for spherical signal interpolation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[49] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[50] Sridhar Mahadevan,et al. Proto-value functions: developmental reinforcement learning , 2005, ICML.
[51] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[52] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[53] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.
[54] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[55] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[56] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[57] Jürgen Schmidhuber,et al. Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.
[58] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.