The Symmetry between Arms and Knapsacks: A Primal-Dual Approach for Bandits with Knapsacks
暂无分享,去创建一个
[1] Sahil Singla,et al. Online Learning with Vector Costs and Bandits with Knapsacks , 2020, COLT.
[2] Aleksandrs Slivkins,et al. Advances in Bandits with Knapsacks , 2020, ArXiv.
[3] Y. Ye,et al. Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds , 2019, Oper. Res..
[4] Nicole Immorlica,et al. Adversarial Bandits with Knapsacks , 2018, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).
[5] David Simchi-Levi,et al. Online Network Revenue Management Using Thompson Sampling , 2017, Oper. Res..
[6] Itay Gurvich,et al. Uniformly bounded regret in the multi-secretary problem , 2017, Stochastic Systems.
[7] Patrick Jaillet,et al. Logarithmic regret bounds for Bandits with Knapsacks , 2015, 1510.01800.
[8] Nikhil R. Devanur,et al. Linear Contextual Bandits with Knapsacks , 2015, NIPS.
[9] Nikhil R. Devanur,et al. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives , 2015, COLT.
[10] Nikhil R. Devanur,et al. Bandits with concave rewards and convex knapsacks , 2014, EC.
[11] Hamid Nazerzadeh,et al. Real-time optimization of personalized assortments , 2013, EC '13.
[12] Aleksandrs Slivkins,et al. Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
[13] Sunil Kumar,et al. A Re-Solving Heuristic with Bounded Revenue Loss for Network Revenue Management with Customer Choice , 2012, Math. Oper. Res..
[14] Omar Besbes,et al. Blind Network Revenue Management , 2011, Oper. Res..
[15] Amin Saberi,et al. Online stochastic matching: online actions based on offline statistics , 2010, SODA '11.
[16] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT 2010.
[17] Y. Ye,et al. A Dynamic Near-Optimal Algorithm for Online Linear Programming , 2009, Oper. Res..
[18] Aranyak Mehta,et al. AdWords and generalized on-line matching , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).
[19] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[20] R. Weber. On the Gittins Index for Multiarmed Bandits , 1992 .
[21] N. Megiddo,et al. On the ε-perturbation method for avoiding degeneracy , 1989 .
[22] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[23] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[24] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[25] 25th Annual Conference on Learning Theory Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2022 .