论文信息 - Toward a classification of finite partial-monitoring games - 字舞流文

Toward a classification of finite partial-monitoring games

Csaba Szepesvári | Gábor Bartók | Dávid Pál | András Antos | Csaba Szepesvari | A. Antos | Gábor Bartók | D. Pál

[1] Dean P. Foster,et al. No Internal Regret via Neighborhood Watch , 2011, AISTATS.

[2] Csaba Szepesvári,et al. Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments , 2011, COLT.

[3] Csaba Szepesvári,et al. L G ] 2 4 A ug 2 01 1 Non-trivial two-armed partial-monitoring games are bandits , 2022 .

[4] Peter L. Bartlett,et al. Optimal Allocation Strategies for the Dark Pool Problem , 2010, AISTATS.

[5] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[6] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.

[7] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[8] Shie Mannor,et al. Strategies for Prediction Under Imperfect Monitoring , 2007, Math. Oper. Res..

[9] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[10] Nicolò Cesa-Bianchi,et al. Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[11] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[12] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.

[13] Avrim Blum,et al. Near-optimal online auctions , 2005, SODA '05.

[14] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[15] Gábor Lugosi,et al. Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.

[16] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.

[17] Vladislav Tadic,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.

[18] Frank Thomson Leighton,et al. The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[19] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[20] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[21] Christian Schindelhauer,et al. Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.

[22] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[23] Philip M. Long,et al. Apple Tasting , 2000, Inf. Comput..

[24] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[25] David P. Helmbold,et al. Some label efficient learning results , 1997, COLT '97.

[26] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[27] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[28] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[29] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[30] David Haussler,et al. How to use expert advice , 1993, STOC.

[31] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[32] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[33] John L. Rhodes,et al. Algebraic Principles for the Analysis of a Biochemical System , 1967, J. Comput. Syst. Sci..

[34] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[35] Philip Wolfe,et al. Contributions to the theory of games , 1953 .

[36] E. C. Titchmarsh. COMPLEX FOURIER—BESSEL TRANSFORMS , 1948 .

[37] J. Littlewood. ON BOUNDED BILINEAR FORMS IN AN INFINITE NUMBER OF VARIABLES , 1930 .