暂无分享,去创建一个
[1] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[2] Nikhil R. Devanur,et al. Bandits with concave rewards and convex knapsacks , 2014, EC.
[3] Mohammad Ghavamzadeh,et al. Stochastic Bandits with Linear Constraints , 2020, AISTATS.
[4] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[5] Christos Thrampoulidis,et al. Linear Stochastic Bandits Under Safety Constraints , 2019, NeurIPS.
[6] Xiaohan Wei,et al. Online Convex Optimization with Stochastic Constraints , 2017, NIPS.
[7] Michael J. Neely,et al. Energy-Aware Wireless Scheduling With Near-Optimal Backlog and Convergence Time Tradeoffs , 2014, IEEE/ACM Transactions on Networking.
[8] Haipeng Luo,et al. Fair Contextual Multi-Armed Bandits: Theory and Experiments , 2019, UAI.
[9] Lei Ying,et al. POND: Pessimistic-Optimistic oNline Dispatch , 2020, ArXiv.
[10] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[11] Jia Liu,et al. Combinatorial Sleeping Bandits with Fairness Constraints , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.
[12] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[13] John Langford,et al. Resourceful Contextual Bandits , 2014, COLT.
[14] R. Srikant,et al. Asymptotically tight steady-state queue length bounds implied by drift conditions , 2011, Queueing Syst. Theory Appl..
[15] Tor Lattimore,et al. Refined Lower Bounds for Adversarial Bandits , 2016, NIPS.
[16] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[17] Philip M. Long,et al. Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.
[18] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[19] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[20] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[21] R. Srikant,et al. Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits , 2015, NIPS.
[22] Atilla Eryilmaz,et al. Budget-Constrained Bandits over General Cost and Reward Distributions , 2020, AISTATS.
[23] David Simchi-Levi,et al. Online Network Revenue Management Using Thompson Sampling , 2017, Oper. Res..
[24] R. Srikant,et al. Bandits with Budgets , 2015, SIGMETRICS.
[25] Andreas Krause,et al. Safe Convex Learning under Uncertain Constraints , 2019, AISTATS.
[26] Lei Ying,et al. Communication Networks - An Optimization, Control, and Stochastic Networks Perspective , 2014 .
[27] Fan Chung Graham,et al. Concentration Inequalities and Martingale Inequalities: A Survey , 2006, Internet Math..
[28] Hao Yu,et al. A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints , 2020, J. Mach. Learn. Res..
[29] Xiaohan Wei,et al. Online Primal-Dual Mirror Descent under Stochastic Constraints , 2019, Proc. ACM Meas. Anal. Comput. Syst..
[30] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[31] Nikhil R. Devanur,et al. Linear Contextual Bandits with Knapsacks , 2015, NIPS.
[32] Nikhil R. Devanur,et al. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives , 2015, COLT.
[33] Rong Jin,et al. Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..
[34] Aleksandrs Slivkins,et al. Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
[35] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[36] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[37] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[38] B. Hajek. Hitting-time and occupation-time bounds implied by drift analysis with applications , 1982, Advances in Applied Probability.
[39] M. Woodroofe. A One-Armed Bandit Problem with a Concomitant Variable , 1979 .