Reward-Weighted Regression Converges to a Global Optimum
暂无分享,去创建一个
[1] Sameera S. Ponda,et al. Autonomous navigation of stratospheric balloons using reinforcement learning , 2020, Nature.
[2] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[3] Yuval Tassa,et al. Relative Entropy Regularized Policy Iteration , 2018, ArXiv.
[4] S. Ana,et al. Topology , 2018, International Journal of Mathematics Trends and Technology.
[5] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.
[6] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[7] Masashi Sugiyama,et al. Hierarchical Policy Search via Return-Weighted Density Estimation , 2017, AAAI.
[8] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[9] Yoshinobu Kawahara,et al. Weighted Likelihood Policy Search with Model Selection , 2012, NIPS.
[10] Jan Peters,et al. Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning , 2011, Neural Computation.
[11] Gerhard Neumann,et al. Variational Inference for Policy Search in changing situations , 2011, ICML.
[12] Jan Peters,et al. Relative Entropy Policy Search , 2010, AAAI.
[13] Masashi Sugiyama,et al. Efficient Sample Reuse in EM-Based Policy Search , 2009, ECML/PKDD.
[14] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[15] Jan Peters,et al. Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.
[16] Tom Schaul,et al. Fitness Expectation Maximization , 2008, PPSN.
[17] Tom Schaul,et al. Episodic Reinforcement Learning by Logistic Reward-Weighted Regression , 2008, ICANN.
[18] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[19] Stefan Schaal,et al. Learning to Control in Operational Space , 2008, Int. J. Robotics Res..
[20] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[21] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[22] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[23] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[24] R. Taylor. A User's Guide to Measure-Theoretic Probability , 2003 .
[25] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[26] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[27] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[28] M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[29] New York Dover,et al. ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .
[30] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[31] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[32] C. Malsburg. Self-organization of orientation sensitive cells in the striate cortex , 2004, Kybernetik.
[33] R. Sutton,et al. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning , 1999 .
[34] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[35] R. Bass,et al. Review: P. Billingsley, Convergence of probability measures , 1971 .
[36] W. Rudin. Principles of mathematical analysis , 1964 .
[37] R. L. Stratonovich. CONDITIONAL MARKOV PROCESSES , 1960 .
[38] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .