On the Optimality of Batch Policy Optimization Algorithms
暂无分享,去创建一个
Tor Lattimore | Dale Schuurmans | Chenjun Xiao | Jincheng Mei | Lihong Li | Yifan Wu | Bo Dai | Csaba Szepesvari | Lihong Li | D. Schuurmans | Tor Lattimore | Jincheng Mei | Yifan Wu | Bo Dai | Chenjun Xiao | Csaba Szepesvári
[1] John Langford,et al. Empirical Likelihood for Contextual Bandits , 2019, NeurIPS.
[2] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[3] Milton Abramowitz,et al. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .
[4] Shie Mannor,et al. Distributional Robustness and Regularization in Reinforcement Learning , 2020, ArXiv.
[5] John Duchi,et al. Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach , 2016, Math. Oper. Res..
[6] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[7] I. Gilboa,et al. Maxmin Expected Utility with Non-Unique Prior , 1989 .
[8] Emma Brunskill,et al. Provably Good Batch Reinforcement Learning Without Great Exploration , 2020, ArXiv.
[9] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..
[10] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[11] Daniel Kuhn,et al. From Data to Decisions: Distributionally Robust Optimization is Optimal , 2017, Manag. Sci..
[12] Daniel Kuhn,et al. A General Framework for Optimal Data-Driven Optimization , 2020, 2010.06606.
[13] Shie Mannor,et al. Distributionally Robust Markov Decision Processes , 2010, Math. Oper. Res..
[14] Yu Bai,et al. Near-Optimal Offline Reinforcement Learning via Double Variance Reduction , 2021, ArXiv.
[15] J. Berger. Statistical Decision Theory and Bayesian Analysis , 1988 .
[16] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[17] Marc G. Bellemare,et al. The Importance of Pessimism in Fixed-Dataset Policy Optimization , 2020, ArXiv.
[18] Robert L. Winkler,et al. The Optimizer's Curse: Skepticism and Postdecision Surprise in Decision Analysis , 2006, Manag. Sci..
[19] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[20] Huan Xu,et al. Distributionally Robust Counterpart in Markov Decision Processes , 2015, IEEE Transactions on Automatic Control.
[21] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[22] Csaba Szepesvári,et al. CoinDICE: Off-Policy Confidence Interval Estimation , 2020, NeurIPS.
[23] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[24] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[25] Paul Covington,et al. Deep Neural Networks for YouTube Recommendations , 2016, RecSys.
[26] Henry Lam,et al. Recovering Best Statistical Guarantees via the Empirical Divergence-Based Distributionally Robust Optimization , 2016, Oper. Res..
[27] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[28] Zhi Chen,et al. Distributionally robust optimization for sequential decision-making , 2018, Optimization.
[29] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.
[30] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[31] Insoon Yang,et al. A Convex Optimization Approach to Distributionally Robust Markov Decision Processes With Wasserstein Distance , 2017, IEEE Control Systems Letters.
[32] E. S. Pearson,et al. On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .
[33] Elena Smirnova,et al. Distributionally Robust Counterfactual Risk Minimization , 2019, AAAI.
[34] Daniel Kuhn,et al. "Dice"-sion-Making Under Uncertainty: When Can a Random Decision Reduce Risk? , 2016, Manag. Sci..
[35] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[36] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.