Toward a classification of finite partial-monitoring games
暂无分享,去创建一个
Csaba Szepesvári | Gábor Bartók | Dávid Pál | András Antos | Csaba Szepesvari | A. Antos | Gábor Bartók | D. Pál
[1] Dean P. Foster,et al. No Internal Regret via Neighborhood Watch , 2011, AISTATS.
[2] Csaba Szepesvári,et al. Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments , 2011, COLT.
[3] Csaba Szepesvári,et al. L G ] 2 4 A ug 2 01 1 Non-trivial two-armed partial-monitoring games are bandits , 2022 .
[4] Peter L. Bartlett,et al. Optimal Allocation Strategies for the Dark Pool Problem , 2010, AISTATS.
[5] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[6] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[7] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .
[8] Shie Mannor,et al. Strategies for Prediction Under Imperfect Monitoring , 2007, Math. Oper. Res..
[9] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.
[10] Nicolò Cesa-Bianchi,et al. Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.
[11] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[12] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.
[13] Avrim Blum,et al. Near-optimal online auctions , 2005, SODA '05.
[14] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[15] Gábor Lugosi,et al. Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.
[16] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.
[17] Vladislav Tadic,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.
[18] Frank Thomson Leighton,et al. The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..
[19] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[20] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[21] Christian Schindelhauer,et al. Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.
[22] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[23] Philip M. Long,et al. Apple Tasting , 2000, Inf. Comput..
[24] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[25] David P. Helmbold,et al. Some label efficient learning results , 1997, COLT '97.
[26] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[27] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[28] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[29] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[30] David Haussler,et al. How to use expert advice , 1993, STOC.
[31] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[32] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.
[33] John L. Rhodes,et al. Algebraic Principles for the Analysis of a Biochemical System , 1967, J. Comput. Syst. Sci..
[34] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[35] Philip Wolfe,et al. Contributions to the theory of games , 1953 .
[36] E. C. Titchmarsh. COMPLEX FOURIER—BESSEL TRANSFORMS , 1948 .
[37] J. Littlewood. ON BOUNDED BILINEAR FORMS IN AN INFINITE NUMBER OF VARIABLES , 1930 .