The Price of Bandit Information for Online Optimization
暂无分享,去创建一个
[1] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[2] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[3] Manfred K. Warmuth,et al. Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..
[4] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[5] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.
[6] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).
[7] Thomas P. Hayes,et al. Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.
[8] Magyar Tud. The On-Line Shortest Path Problem Under Partial Monitoring , 2007 .
[9] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[10] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[11] Thomas P. Hayes,et al. High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.
[12] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .