暂无分享,去创建一个
[1] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[2] Richard Socher,et al. Revisiting Activation Regularization for Language RNNs , 2017, ArXiv.
[3] Joelle Pineau,et al. Separating value functions across time-scales , 2019, ICML 2019.
[4] Olivier Sigaud,et al. Investigating Generalisation in Continuous Deep Reinforcement Learning , 2019, ArXiv.
[5] Marek Petrik,et al. Biasing Approximate Dynamic Programming with a Lower Discount Factor , 2008, NIPS.
[6] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[7] D. Jerison,et al. General mixing time bounds for finite Markov chains via the absolute spectral gap , 2013, 1310.8021.
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] Shie Mannor,et al. Reward Tweaking: Maximizing the Total Reward While Planning for Short Horizons. , 2020 .
[10] Sham M. Kakade,et al. Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.
[11] Nan Jiang,et al. On Structural Properties of MDPs that Bound Loss Due to Shallow Planning , 2016, IJCAI.
[12] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[13] Nan Jiang,et al. Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.
[14] Rutherford Aris,et al. Discrete Dynamic Programming , 1965, The Mathematical Gazette.
[15] Trevor Darrell,et al. Regularization Matters in Policy Optimization , 2019, ArXiv.
[16] Samy Bengio,et al. A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.
[17] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[18] Damien Ernst,et al. On overfitting and asymptotic bias in batch reinforcement learning with partial observability , 2017, J. Artif. Intell. Res..
[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[20] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[21] Patrick M. Pilarski,et al. Gamma-Nets: Generalizing Value Estimation over Timescale , 2019, AAAI.
[22] Marlos C. Machado,et al. Generalization and Regularization in DQN , 2018, ArXiv.
[23] Mohammad Emtiyaz Khan,et al. TD-regularized actor-critic methods , 2018, Machine Learning.
[24] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[25] R. Bellman. A Markovian Decision Process , 1957 .
[26] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[27] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[28] Mykel J. Kochenderfer,et al. Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[29] V. Climenhaga. Markov chains and mixing times , 2013 .
[30] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[31] Hermann Ney,et al. Improving Neural Language Models with Weight Norm Initialization and Regularization , 2018, WMT.
[32] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[33] Mohammad Ghavamzadeh,et al. Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning , 2019, ArXiv.
[34] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[35] Shie Mannor,et al. Beyond the One Step Greedy Approach in Reinforcement Learning , 2018, ICML.
[36] Richard Socher,et al. On the Generalization Gap in Reparameterizable Reinforcement Learning , 2019, ICML.
[37] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[38] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[39] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[40] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[41] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[42] Shie Mannor,et al. Maximizing the Total Reward via Reward Tweaking , 2020, ArXiv.
[43] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[44] Harm van Seijen,et al. Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning , 2019, NeurIPS.
[45] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[46] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[47] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.
[48] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[49] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[50] Joelle Pineau,et al. A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.
[51] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.
[52] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[53] Bruno Scherrer,et al. Leverage the Average: an Analysis of Regularization in RL , 2020, ArXiv.
[54] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[55] R. Forthofer,et al. Rank Correlation Methods , 1981 .
[56] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[57] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.