A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits
暂无分享,去创建一个
[1] M. Ghavamzadeh,et al. On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes , 2023, 2304.12477.
[2] B. Kveton,et al. Multi-Task Off-Policy Learning from Bandit Feedback , 2022, ICML.
[3] S. Levine,et al. Offline RL Policies Should be Trained to be Adaptive , 2022, ICML.
[4] Juan Pablo Vielma,et al. JuMP 1.0: recent improvements to a modeling language for mathematical optimization , 2022, Mathematical Programming Computation.
[5] Alekh Agarwal,et al. Adversarially Trained Actor Critic for Offline Reinforcement Learning , 2022, ICML.
[6] Masatoshi Uehara,et al. Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage , 2021, ICLR.
[7] Anca D. Dragan,et al. Policy Gradient Bayesian Robust Optimization for Imitation Learning , 2021, ICML.
[8] Tor Lattimore,et al. On the Optimality of Batch Policy Optimization Algorithms , 2021, ICML.
[9] Stuart J. Russell,et al. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism , 2021, IEEE Transactions on Information Theory.
[10] Brian T. Denton,et al. Multi-model Markov decision processes , 2021, IISE Trans..
[11] D. Bertsimas,et al. Probabilistic Guarantees in Robust Optimization , 2021, SIAM Journal on Optimization.
[12] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.
[13] Mohammad Ghavamzadeh,et al. Soft-Robust Algorithms for Handling Model Misspecification , 2020, ArXiv.
[14] Marek Petrik,et al. Bayesian Robust Optimization for Imitation Learning , 2020, NeurIPS.
[15] Vishal Gupta,et al. Near-Optimal Bayesian Ambiguity Sets for Distributionally Robust Optimization , 2019, Manag. Sci..
[16] Marek Petrik,et al. Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs , 2019, NeurIPS.
[17] Michael I. Jordan,et al. A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm , 2019, ArXiv.
[18] Yong Xia,et al. Chebyshev center of the intersection of balls: complexity, relaxation and approximation , 2019, Mathematical Programming.
[19] Timothy A. Mann,et al. Soft-Robust Actor-Critic Policy-Gradient , 2018, UAI 2018.
[20] Amir Ahmadi-Javid,et al. Entropic Value-at-Risk: A New Coherent Risk Measure , 2012, J. Optim. Theory Appl..
[21] Sham M. Kakade,et al. A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.
[22] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[23] John K Kruschke,et al. Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.
[24] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .
[25] James R. Luedtke,et al. A Sample Approximation Approach for Optimization with Probabilistic Constraints , 2008, SIAM J. Optim..
[26] Alexander Shapiro,et al. Convex Approximations of Chance Constrained Programs , 2006, SIAM J. Optim..
[27] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[28] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .
[29] János D. Pintér,et al. Deterministic approximations of probability inequalities , 1989, ZOR Methods Model. Oper. Res..
[30] M. Ghavamzadeh,et al. Entropic Risk Optimization in Discounted MDPs , 2023, AISTATS.
[31] Marek Petrik,et al. Optimizing Percentile Criterion using Robust MDPs , 2021, AISTATS.
[32] Caroline Ponzoni Carvalho Chanel,et al. Exploitation vs Caution: Risk-sensitive Policies for Offline Learning , 2021, ArXiv.
[33] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[34] Shie Mannor,et al. Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..
[35] Laurent El Ghaoui,et al. Robust Optimization , 2021, ICORES.
[36] A. Nemirovski,et al. Scenario Approximations of Chance Constraints , 2006 .
[37] Giuseppe Carlo Calafiore,et al. Uncertain convex programs: randomized solutions and confidence levels , 2005, Math. Program..
[38] H. Föllmer,et al. Stochastic Finance: An Introduction in Discrete Time , 2002 .