Model-free reinforcement learning as mixture learning
暂无分享,去创建一个
[1] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[2] Gregory F. Cooper,et al. A Method for Using Belief Networks as Influence Diagrams , 2013, UAI 1988.
[3] P. McCullagh,et al. Generalized Linear Models , 1992 .
[4] G. C. Wei,et al. A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .
[5] G. Celeux,et al. A stochastic approximation type EM algorithm for the mixture problem , 1992 .
[6] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[7] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[10] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[11] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[12] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[13] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[14] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[15] É. Moulines,et al. Convergence of a stochastic approximation version of the EM algorithm , 1999 .
[16] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..
[17] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[18] Theodore J. Perkins,et al. On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains , 2002, ICML.
[19] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] P. Poupart. Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .
[22] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.
[23] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.
[24] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..
[25] Nando de Freitas,et al. Bayesian Policy Learning with Trans-Dimensional MCMC , 2007, NIPS.
[26] Guy Shani,et al. Forward Search Value Iteration for POMDPs , 2007, IJCAI.
[27] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[28] Stefan Schaal,et al. Learning to Control in Operational Space , 2008, Int. J. Robotics Res..
[29] Jan Peters,et al. Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.
[30] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .