Combining expert advice in reactive environments
暂无分享,去创建一个
[1] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .
[2] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[3] J. Runnenburg. PROBABILITY THEORY AND ITS APPLICATIONS , 1985 .
[4] David Williams,et al. Probability with Martingales , 1991, Cambridge mathematical textbooks.
[5] David Haussler,et al. How to use expert advice , 1993, STOC.
[6] Dean P. Foster,et al. A Randomization Rule for Selecting Forecasts , 1993, Oper. Res..
[7] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[8] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[9] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[10] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.
[11] S. Yakowitz,et al. Machine learning and nonparametric bandit theory , 1995, IEEE Trans. Autom. Control..
[12] Manfred K. Warmuth,et al. How to use expert advice , 1997, JACM.
[13] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[14] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .
[15] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .
[16] Dean P. Foster,et al. Regret in the On-Line Decision Problem , 1999 .
[17] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[18] Nimrod Megiddo,et al. How to Combine Expert (and Novice) Advice when Actions Impact the Environment? , 2003, NIPS.
[19] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[20] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[21] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[22] Dan Suciu,et al. Journal of the ACM , 2006 .