暂无分享,去创建一个
[1] H. Chernoff. Sequential Analysis and Optimal Design , 1987 .
[2] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[3] Benjamin Van Roy,et al. Feature-based methods for large scale dynamic programming , 1995 .
[4] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[5] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.
[6] K. Ball. An elementary introduction to modern convex geometry, in flavors of geometry , 1997 .
[7] K. Ball. An Elementary Introduction to Modern Convex Geometry , 1997 .
[8] Andrew W. Moore,et al. Barycentric Interpolators for Continuous Space and Time Reinforcement Learning , 1998, NIPS.
[9] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[10] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[11] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[12] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[13] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[14] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[15] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[16] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[17] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[18] M. Dahleh. Laboratory for Information and Decision Systems , 2005 .
[19] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[20] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[21] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[22] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[23] Sham M. Kakade,et al. A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.
[24] Sham M. Kakade,et al. Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.
[25] J. Andrew Bagnell,et al. Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.
[26] Sham M. Kakade,et al. Random Design Analysis of Ridge Regression , 2012, COLT.
[27] Sergey Levine,et al. Offline policy evaluation across representations with applications to educational games , 2014, AAMAS.
[28] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[29] Joel A. Tropp,et al. An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..
[30] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[31] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[32] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .
[33] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[34] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[35] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[36] Philip S. Thomas,et al. Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation , 2017, NIPS.
[37] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[38] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.
[39] Marcello Restelli,et al. Boosted Fitted Q-Iteration , 2017, ICML.
[40] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[41] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[42] Srivatsan Srinivasan,et al. Evaluating Reinforcement Learning Algorithms in Observational Health Settings , 2018, ArXiv.
[43] Lu Wang,et al. Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation , 2018, KDD.
[44] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning , 2019, Oper. Res..
[45] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog , 2019 .
[46] Yuriy Brun,et al. Preventing undesirable behavior of intelligent machines , 2019, Science.
[47] Mykel J. Kochenderfer,et al. Limiting Extrapolation in Linear Approximate Value Iteration , 2019, NeurIPS.
[48] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[49] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[50] Romain Laroche,et al. Safe Policy Improvement with Baseline Bootstrapping , 2017, ICML.
[51] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[52] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[53] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[54] Chao Yu,et al. Deep Inverse Reinforcement Learning for Sepsis Treatment , 2019, 2019 IEEE International Conference on Healthcare Informatics (ICHI).
[55] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Deep Reinforcement Learning , 2020, International Conference on Machine Learning.
[56] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[57] Nan Jiang,et al. $Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison , 2020, 2003.03924.
[58] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..
[59] T. Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[60] Ruosong Wang,et al. On Reward-Free Reinforcement Learning with Linear Function Approximation , 2020, NeurIPS.
[61] Rishabh Agarwal,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2019, ICML.
[62] Qiang Liu,et al. Accountable Off-Policy Evaluation With Kernel Bellman Statistics , 2020, ICML.
[63] Jiawei Huang,et al. Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization , 2020, ArXiv.
[64] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[65] Yao Liu,et al. Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling , 2019, ICML.
[66] Sergey Levine,et al. DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction , 2020, NeurIPS.
[67] Xi Chen,et al. On the Sample Complexity of Reinforcement Learning with Policy Space Generalization , 2020, ArXiv.
[68] Nan Jiang,et al. Batch Value-function Approximation with Only Realizability , 2020, ICML.