Bandits with switching costs: T2/3 regret
暂无分享,去创建一个
Yuval Peres | Ofer Dekel | Jian Ding | Tomer Koren | O. Dekel | Tomer Koren | Y. Peres | Jian Ding
[1] Berthold Vöcking,et al. Regret Minimization for Online Buffering Problems Using the Weighted Majority Algorithm , 2010, Electron. Colloquium Comput. Complex..
[2] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[3] Paul D. Seymour,et al. Graphs with small bandwidth and cutwidth , 1989, Discret. Math..
[4] Ambuj Tewari,et al. Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.
[5] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[6] András György,et al. Near-Optimal Rates for Limited-Delay Universal Lossy Source Coding , 2014, IEEE Transactions on Information Theory.
[7] Nicolò Cesa-Bianchi,et al. Online Learning with Switching Costs and Other Adaptive Adversaries , 2013, NIPS.
[8] David Haussler,et al. How to use expert advice , 1993, STOC.
[9] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[10] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[11] Andrew Chi-Chih Yao,et al. Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).
[12] Gergely Neu,et al. Near-Optimal Rates for Limited-Delay Universal Lossy Source Coding , 2014, IEEE Trans. Inf. Theory.
[13] Elad Hazan,et al. Better Rates for Any Adversarial Deterministic MDP , 2013, ICML.
[14] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[15] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[16] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[17] Thomas M. Cover,et al. Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .
[18] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..
[19] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.