暂无分享,去创建一个
[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[2] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[3] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[4] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[5] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[6] Bruno Scherrer,et al. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.
[7] Nan Jiang,et al. Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon , 2018, COLT.
[8] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[9] Stefan Wager,et al. Augmented minimax linear estimation , 2017, The Annals of Statistics.
[10] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[11] Yi Su,et al. Doubly robust off-policy evaluation with shrinkage , 2019, ICML.
[12] David A. Hirshberg,et al. Balancing Out Regression Error: Efficient Treatment Effect Estimation without Smooth Propensities , 2017 .
[13] A. Müller. Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.
[14] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[15] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[16] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[17] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[18] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[19] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[20] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[21] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[22] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[23] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[24] Nathan Kallus,et al. Generalized Optimal Matching Methods for Causal Inference , 2016, J. Mach. Learn. Res..
[25] Qiang Liu,et al. A Kernel Loss for Solving the Bellman Equation , 2019, NeurIPS.
[26] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[27] Balas K. Natarajan,et al. On learning sets and functions , 2004, Machine Learning.
[28] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[29] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[30] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[31] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[32] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[33] Matthieu Geist,et al. Is the Bellman residual a bad proxy? , 2016, NIPS.
[34] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.