The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
暂无分享,去创建一个
[1] Georg Ostrovski,et al. Distributional Reinforcement Learning , 2023 .
[2] É. Moulines,et al. One-Step Distributional Reinforcement Learning , 2023, ArXiv.
[3] Daniel Russo,et al. On the Statistical Benefits of Temporal Difference Learning , 2023, ICML.
[4] Marc G. Bellemare,et al. An Analysis of Quantile Temporal-Difference Learning , 2023, ArXiv.
[5] Jimmy Ba,et al. Mastering Diverse Domains through World Models , 2023, ArXiv.
[6] Aja Huang,et al. Discovering faster matrix multiplication algorithms with reinforcement learning , 2022, Nature.
[7] Ke Sun,et al. How Does Value Distribution in Distributional Reinforcement Learning Help Optimization? , 2022, ArXiv.
[8] Thomas J. Walsh,et al. Outracing champion Gran Turismo drivers with deep reinforcement learning , 2022, Nature.
[9] Ke Sun,et al. Distributional Reinforcement Learning via Sinkhorn Iterations , 2022, ArXiv.
[10] Gergely Neu,et al. Robustness and risk management via distributional dynamic programming , 2021, ArXiv.
[11] Georgios B. Giannakis,et al. Robust and Adaptive Temporal-Difference Learning Using An Ensemble of Gaussian Processes , 2021, ArXiv.
[12] Nicolas Bondoux,et al. A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning , 2021, AISTATS.
[13] Gilles Louppe,et al. Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks , 2021, Neurocomputing.
[14] Ivo Danihelka,et al. Muesli: Combining Improvements in Policy Optimization , 2021, ICML.
[15] Sameera S. Ponda,et al. Autonomous navigation of stratospheric balloons using reinforcement learning , 2020, Nature.
[16] Shijie Huang,et al. Exploiting Distributional Temporal Difference Learning to Deal with Tail Risk , 2020, Risks.
[17] O. Pietquin,et al. Munchausen Reinforcement Learning , 2020, NeurIPS.
[18] Svetha Venkatesh,et al. Distributional Reinforcement Learning via Moment Matching , 2020, AAAI.
[19] Adam White,et al. Gradient Temporal-Difference Learning with Regularized Corrections , 2020, ICML.
[20] Jaime Fern'andez del R'io,et al. Array programming with NumPy , 2020, Nature.
[21] Marc G. Bellemare,et al. The Value-Improvement Path: Towards Better Representations for Reinforcement Learning , 2020, AAAI.
[22] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[23] Tie-Yan Liu,et al. Fully Parameterized Quantile Function for Distributional Reinforcement Learning , 2019, NeurIPS.
[24] Karol Hausman,et al. Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping , 2019, Robotics: Science and Systems.
[25] R. Munos,et al. Adaptive Trade-Offs in Off-Policy Learning , 2019, AISTATS.
[26] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.
[27] Marc G. Bellemare,et al. Statistics and Samples in Distributional Reinforcement Learning , 2019, ICML.
[28] Marc G. Bellemare,et al. A Comparative Analysis of Expected and Distributional Reinforcement Learning , 2019, AAAI.
[29] Karl Tuyls,et al. Robust temporal difference learning for critical domains , 2019, AAMAS.
[30] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[31] Martha White,et al. Improving Regression Performance with Distributional Losses , 2018, ICML.
[32] Yee Whye Teh,et al. An Analysis of Categorical Distributional Reinforcement Learning , 2018, AISTATS.
[33] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[34] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[35] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[36] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[37] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.
[38] Martha White,et al. A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning , 2016, AAMAS.
[39] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[40] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[41] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[42] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[43] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[44] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[45] C. Villani. Optimal Transport: Old and New , 2008 .
[46] John D. Hunter,et al. Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.
[47] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[48] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[49] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.
[50] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[51] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[52] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[53] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[54] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[55] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.
[56] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[57] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[58] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[59] C. J. Lawrence. Robust estimates of location : survey and advances , 1975 .
[60] J. Gastwirth. ON ROBUST PROCEDURES , 1966 .
[61] H. Robbins. A Stochastic Approximation Method , 1951 .
[62] F. Mosteller. On Some Useful "Inefficient" Statistics , 1946 .
[63] P. J. Daniell. Observations Weighted According to Order , 1920 .
[64] J. Z. Kolter,et al. The Pitfalls of Regularization in Off-Policy TD Learning , 2022, NeurIPS.
[65] P. Poupart,et al. Distributional Reinforcement Learning with Monotonic Splines , 2022, ICLR.
[66] Dominik Meyer,et al. Accelerated Gradient Algorithms for Robust Temporal Difference Learning , 2021 .
[67] Xingdong Feng,et al. Non-Crossing Quantile Regression for Distributional Reinforcement Learning , 2020, NeurIPS.
[68] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .
[69] R. Koenker,et al. Regression Quantiles , 2007 .
[70] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[71] Michael I. Jordan,et al. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms , 1994, Neural Computation.
[72] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[73] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .
[74] D. Allen,et al. Quantile Regression , 2022 .