Is the Bellman residual a bad proxy?
暂无分享,去创建一个
[1] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[2] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[3] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[4] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[5] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .
[6] Matthieu Geist,et al. Softened Approximate Policy Iteration for Markov Games , 2016, ICML.
[7] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[8] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[11] J. Filar,et al. On the Algorithm of Pollatschek and Avi-ltzhak , 1991 .
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Matthieu Geist,et al. Difference of Convex Functions Programming for Reinforcement Learning , 2014, NIPS.
[14] Bruno Scherrer,et al. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.
[15] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[16] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[17] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[18] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[19] Matthieu Geist,et al. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search , 2014, ECML/PKDD.
[20] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[21] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[22] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[23] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[24] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[25] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[26] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[27] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[28] Vivek F. Farias,et al. Approximate Dynamic Programming via a Smoothed Linear Program , 2009, Oper. Res..