Toward a classification of finite partial-monitoring games

[1]  Dean P. Foster,et al.  No Internal Regret via Neighborhood Watch , 2011, AISTATS.

[2]  Csaba Szepesvári,et al.  Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments , 2011, COLT.

[3]  Csaba Szepesvári,et al.  L G ] 2 4 A ug 2 01 1 Non-trivial two-armed partial-monitoring games are bandits , 2022 .

[4]  Peter L. Bartlett,et al.  Optimal Allocation Strategies for the Dark Pool Problem , 2010, AISTATS.

[5]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[6]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[7]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[8]  Shie Mannor,et al.  Strategies for Prediction Under Imperfect Monitoring , 2007, Math. Oper. Res..

[9]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[10]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[11]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[12]  Richard S. Sutton,et al.  Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.

[13]  Avrim Blum,et al.  Near-optimal online auctions , 2005, SODA '05.

[14]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[15]  Gábor Lugosi,et al.  Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.

[16]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[17]  Vladislav Tadic,et al.  On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.

[18]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[19]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[20]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[21]  Christian Schindelhauer,et al.  Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.

[22]  Sanjoy Dasgupta,et al.  Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[23]  Philip M. Long,et al.  Apple Tasting , 2000, Inf. Comput..

[24]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[25]  David P. Helmbold,et al.  Some label efficient learning results , 1997, COLT '97.

[26]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[27]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[28]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[29]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[30]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[31]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[32]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[33]  John L. Rhodes,et al.  Algebraic Principles for the Analysis of a Biochemical System , 1967, J. Comput. Syst. Sci..

[34]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[35]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[36]  E. C. Titchmarsh COMPLEX FOURIER—BESSEL TRANSFORMS , 1948 .

[37]  J. Littlewood ON BOUNDED BILINEAR FORMS IN AN INFINITE NUMBER OF VARIABLES , 1930 .