暂无分享,去创建一个
Quanquan Gu | Tianhao Wang | Yifei Min | Dongruo Zhou | Quanquan Gu | Dongruo Zhou | Yifei Min | Tianhao Wang
[1] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[2] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[3] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[4] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[5] Qiang Liu,et al. Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation , 2019, ICLR.
[6] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[7] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[8] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[9] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[10] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.
[11] Marco Corazza,et al. Testing different Reinforcement Learning con?gurations for ?nancial trading: Introduction and applications , 2018 .
[12] Sergey Levine,et al. Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[13] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[14] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.
[15] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[16] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[17] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[18] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[19] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[20] Yu Bai,et al. Near-Optimal Offline Reinforcement Learning via Double Variance Reduction , 2021, ArXiv.
[21] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[22] Xiangyang Ji,et al. Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and Horizon-Free Bound for Linear Mixture MDP , 2021, ArXiv.
[23] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[24] Peter L. Bartlett,et al. Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm , 2021, ArXiv.
[25] Alessandro Lazaric,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[26] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.
[27] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[28] D. Freedman. On Tail Probabilities for Martingales , 1975 .
[29] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[30] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[31] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[32] Csaba Szepesvári,et al. CoinDICE: Off-Policy Confidence Interval Estimation , 2020, NeurIPS.
[33] Gergely Neu,et al. A Unifying View of Optimism in Episodic Reinforcement Learning , 2020, NeurIPS.
[34] Yu-Xiang Wang,et al. Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning , 2020, AISTATS.
[35] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning , 2019 .
[36] Liang Tang,et al. Automatic ad format selection via contextual bandits , 2013, CIKM.
[37] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[38] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[39] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[40] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[41] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[42] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[43] Yu Bai,et al. Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning , 2021, AISTATS.
[44] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[45] Quanquan Gu,et al. Logarithmic Regret for Reinforcement Learning with Linear Function Approximation , 2020, ICML.
[46] Ruosong Wang,et al. Instabilities of Offline RL with Pre-Trained Neural Representation , 2021, ICML.
[47] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[48] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[49] Philip S. Thomas,et al. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , 2017, AAAI.
[50] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[51] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[52] Quanquan Gu,et al. Learning Stochastic Shortest Path with Linear Function Approximation , 2021, ArXiv.