暂无分享,去创建一个
[1] Philip Wolfe,et al. Contributions to the theory of games , 1953 .
[2] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[3] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[4] Manfred K. Warmuth,et al. Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..
[5] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[6] Manfred K. Warmuth,et al. Path kernels and multiplicative updates , 2003 .
[7] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[8] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.
[9] Jan Poland,et al. FPL Analysis for Adaptive Bandits , 2005, SAGA.
[10] Peter Auer,et al. Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring , 2006, ALT.
[11] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[12] Tamás Linder,et al. The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..
[13] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.
[14] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[15] Jean-Yves Audibert,et al. Minimax Policies for Bandits Games , 2009, COLT 2009.
[16] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[17] Csaba Szepesvari,et al. The Online Loop-free Stochastic Shortest-Path Problem , 2010, Annual Conference Computational Learning Theory.
[18] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..
[19] Wouter M. Koolen,et al. Hedging Structured Concepts , 2010, COLT.
[20] Shuji Kijima,et al. Online Prediction under Submodular Constraints , 2012, ALT.
[21] Sham M. Kakade,et al. Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.
[22] Luc Devroye,et al. Prediction by random-walk perturbation , 2013, COLT.
[23] Gábor Lugosi,et al. Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..
[24] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.