Gambler Bandits and the Regret of Being Ruined
暂无分享,去创建一个
Mathieu Bourgais | Filipo Studzinski Perotto | Sattar Vakili | Yaser Faghan | Pratik Gajane | F. S. Perotto | Pratik Gajane | Sattar Vakili | Yaser Faghan | Mathieu Bourgais
[1] S. Finch. Gambler's Ruin , 2022 .
[2] Rémi Munos,et al. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.
[3] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[4] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[5] Claire J. Tomlin,et al. Budget-Constrained Multi-Armed Bandits with Multiple Plays , 2017, AAAI.
[6] Marco Pavone,et al. How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics , 2017, ISRR.
[7] Mathieu Bourgais,et al. Open Problem: Risk of Ruin in Multiarmed Bandits , 2019, COLT.
[8] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.
[9] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[10] Archie C. Chapman,et al. Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.
[11] Seongjoo Song,et al. A Note on the History of the Gambler's Ruin Problem , 2013 .
[12] Javier García,et al. Teaching a humanoid robot to walk faster through Safe Reinforcement Learning , 2020, Eng. Appl. Artif. Intell..
[13] Aurélien Garivier,et al. KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints , 2018, J. Mach. Learn. Res..
[14] Nenghai Yu,et al. Finite budget analysis of multi-armed bandit problems , 2017, Neurocomputing.
[15] Roi Livni,et al. Multi-Armed Bandits with Metric Movement Costs , 2017, NIPS.
[16] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[17] Sattar Vakili,et al. Decision Variance in Risk-Averse Online Learning , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).
[18] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[19] Aleksandrs Slivkins,et al. Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
[20] Filipo Studzinski Perotto. Looking for the Right Time to Shift Strategy in the Exploration-exploitation Dilemma , 2015 .
[21] Shipra Agrawal,et al. Near-Optimal Regret Bounds for Thompson Sampling , 2017, J. ACM.
[22] Nikhil R. Devanur,et al. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives , 2015, COLT.
[23] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[24] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[25] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..
[26] Qiuyu Zhu,et al. Thompson Sampling Algorithms for Mean-Variance Bandits , 2020, ICML.
[27] Alessandro Lazaric,et al. Conservative Exploration in Reinforcement Learning , 2020, AISTATS.
[28] Tao Qin,et al. Multi-Armed Bandit with Budget Constraint and Variable Costs , 2013, AAAI.
[29] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[30] Hao Wang,et al. Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising , 2018, CIKM.
[31] Michèle Sebag,et al. Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , 2013, ACML.
[32] Archie C. Chapman,et al. Epsilon-First Policies for Budget-Limited Multi-Armed Bandits , 2010, AAAI.
[33] Sudipto Guha,et al. Approximation algorithms for budgeted learning problems , 2007, STOC '07.
[34] Shie Mannor,et al. A General Approach to Multi-Armed Bandits Under Risk Criteria , 2018, COLT.
[35] Odalric-Ambrym Maillard,et al. Robust Risk-Averse Stochastic Multi-armed Bandits , 2013, ALT.
[36] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.
[37] Marco Pavone,et al. Risk aversion in finite Markov Decision Processes using total cost criteria and average value at risk , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[38] Gábor Orosz,et al. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.
[39] Sudipto Guha,et al. Graph Sparsification in the Semi-streaming Model , 2009, ICALP.
[40] Qing Zhao,et al. Risk-Averse Multi-Armed Bandit Problems Under Mean-Variance Measure , 2016, IEEE Journal of Selected Topics in Signal Processing.
[41] Yuval Peres,et al. Bandits with switching costs: T2/3 regret , 2013, STOC.
[42] Shie Mannor,et al. Exploration-Exploitation in Constrained MDPs , 2020, ArXiv.
[43] Antoine Chambaz,et al. Asymptotically optimal algorithms for budgeted multiple play bandits , 2016, Machine Learning.
[44] Roi Livni,et al. Bandits with Movement Costs and Adaptive Pricing , 2017, COLT.
[45] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[46] Alessandro Lazaric,et al. Improved Algorithms for Conservative Exploration in Bandits , 2020, AAAI.
[47] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[48] Aleksandrs Slivkins,et al. Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..
[49] Dominik D. Freydenberger,et al. Can We Learn to Gamble Efficiently? , 2010, COLT.
[50] Feller William,et al. An Introduction To Probability Theory And Its Applications , 1950 .
[51] Mehryar Mohri,et al. Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.
[52] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[53] Yifan Wu,et al. Conservative Bandits , 2016, ICML.