Dynamic policy programming
暂无分享,去创建一个
[1] G. Pisier,et al. The Law of Large Numbers and the Central Limit Theorem in Banach Spaces , 1976 .
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[3] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[4] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[5] Reid G. Simmons,et al. Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.
[6] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[7] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[8] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[9] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[10] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[11] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[12] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[13] Xiao-Li Meng,et al. Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .
[14] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[15] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[16] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[17] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .
[18] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[19] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[20] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[21] Peter L. Bartlett,et al. An Introduction to Reinforcement Learning Theory: Value Function Methods , 2002, Machine Learning Summer School.
[22] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[23] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[24] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[25] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[26] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[27] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[28] William D. Smart,et al. Interpolation-based Q-learning , 2004, ICML.
[29] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[30] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[31] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[32] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[33] H. Kappen. Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.
[34] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[35] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[36] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[37] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[38] Tao Wang,et al. Stable Dual Dynamic Programming , 2007, NIPS.
[39] Tao Wang,et al. Dual Representations for Dynamic Programming and Reinforcement Learning , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[40] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[41] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[42] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[43] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[44] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[45] Marc Toussaint,et al. Model-free reinforcement learning as mixture learning , 2009, ICML '09.
[46] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[47] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .
[48] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[49] Shie Mannor,et al. Regularized Fitted Q-iteration: Application to Planning , 2008, EWRL.
[50] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[51] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .
[52] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[53] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[54] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[55] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[56] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[57] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[58] Doina Precup,et al. An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.
[59] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.
[60] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.