Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
暂无分享,去创建一个
[1] Tor Lattimore,et al. On the Optimality of Batch Policy Optimization Algorithms , 2021, ICML.
[2] Sergey Levine,et al. COMBO: Conservative Offline Model-Based Policy Optimization , 2021, NeurIPS.
[3] Masatoshi Uehara,et al. Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency , 2021, ArXiv.
[4] Yu-Xiang Wang,et al. Near-Optimal Offline Reinforcement Learning via Double Variance Reduction , 2021, NeurIPS.
[5] Martin J. Wainwright,et al. Minimax Off-Policy Evaluation for Multi-Armed Bandits , 2021, IEEE Transactions on Information Theory.
[6] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.
[7] Pang Wei Koh,et al. WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.
[8] Andrea Zanette,et al. Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL , 2020, ICML.
[9] Tor Lattimore,et al. Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient , 2020, ICML.
[10] Yang Yu,et al. Error Bounds of Imitating Policies and Environments for Reinforcement Learning , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Ruosong Wang,et al. What are the Statistical Limits of Offline RL with Linear Function Approximation? , 2020, ICLR.
[12] Michal Valko,et al. Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited , 2020, ALT.
[13] Marc G. Bellemare,et al. The Importance of Pessimism in Fixed-Dataset Policy Optimization , 2020, ICLR.
[14] Lin F. Yang,et al. Toward the Fundamental Limits of Imitation Learning , 2020, NeurIPS.
[15] Nan Jiang,et al. Batch Value-function Approximation with Only Realizability , 2020, ICML.
[16] S. Murphy,et al. BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES. , 2020, Annals of statistics.
[17] Seyed Kamyar Seyed Ghasemipour,et al. EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL , 2020, ICML.
[18] Emma Brunskill,et al. Provably Good Batch Reinforcement Learning Without Great Exploration , 2020, ArXiv.
[19] Yu Bai,et al. Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning , 2020, ArXiv.
[20] Alec Koppel,et al. Variational Policy Gradient Method for Reinforcement Learning with General Utilities , 2020, NeurIPS.
[21] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[22] Sergio Gomez Colmenarejo,et al. RL Unplugged: Benchmarks for Offline Reinforcement Learning , 2020, ArXiv.
[23] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[24] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[25] Yuxin Chen,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[26] T. Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[27] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[28] Justin Fu,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[29] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[30] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[31] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[32] S. Whiteson,et al. GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values , 2020, ICML.
[33] Bo Dai,et al. Reinforcement Learning via Fenchel-Rockafellar Duality , 2020, ArXiv.
[34] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[35] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[36] Lin F. Yang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2019, ICLR.
[37] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.
[38] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[39] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[40] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[41] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[42] Romain Laroche,et al. Safe Policy Improvement with Soft Baseline Bootstrapping , 2019, ECML/PKDD.
[43] Rishabh Agarwal,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2019, ICML.
[44] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[45] Alexander Carballo,et al. A Survey of Autonomous Driving: Common Practices and Emerging Technologies , 2019, IEEE Access.
[46] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT.
[47] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[48] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[49] Nan Jiang,et al. On Value Functions and the Agent-Environment Boundary , 2019, ArXiv.
[50] Qiang Liu,et al. A Kernel Loss for Solving the Bellman Equation , 2019, NeurIPS.
[51] Xinkun Nie,et al. Learning When-to-Treat Policies , 2019, Journal of the American Statistical Association.
[52] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[53] Fredrik D. Johansson,et al. Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.
[54] Tim Salimans,et al. Learning Montezuma's Revenge from a Single Demonstration , 2018, ArXiv.
[55] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[56] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[57] Lu Wang,et al. Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation , 2018, KDD.
[58] Lin F. Yang,et al. Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model , 2018, 1806.01492.
[59] Romain Laroche,et al. Safe Policy Improvement with Baseline Bootstrapping , 2017, ICML.
[60] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[61] Byron Boots,et al. Agile Autonomous Driving using End-to-End Deep Imitation Learning , 2017, Robotics: Science and Systems.
[62] Philip S. Thomas,et al. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , 2017, AAAI.
[63] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[64] John C. Duchi,et al. Variance-based Regularization with Convex Objectives , 2016, NIPS.
[65] Yanjun Han,et al. Minimax estimation of the L1 distance , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).
[66] Matthieu Geist,et al. Is the Bellman residual a bad proxy? , 2016, NIPS.
[67] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[68] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[69] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[70] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[71] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[72] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[73] Thorsten Joachims,et al. Counterfactual Risk Minimization , 2015, ICML.
[74] Boi Faltings,et al. Offline and online evaluation of news recommender systems at swissinfo.ch , 2014, RecSys '14.
[75] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[76] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[77] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[78] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[79] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..
[80] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[81] U. Rieder,et al. Markov Decision Processes , 2010 .
[82] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[83] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[84] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[85] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[86] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[87] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[88] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[89] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[90] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[91] E. Gilbert. A comparison of signalling alphabets , 1952 .
[92] Sergio Gomez Colmenarejo,et al. RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning , 2020 .
[93] Emma Brunskill,et al. Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration , 2020, NeurIPS.
[94] Qi Cai,et al. Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy , 2019, NeurIPS.
[95] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[96] Bin Yu. Assouad, Fano, and Le Cam , 1997 .
[97] L. L. Cam,et al. Asymptotic Methods In Statistical Decision Theory , 1986 .
[98] L. Goddard. Information Theory , 1962, Nature.