Markov Decision Processes with Arbitrary Reward Processes
暂无分享,去创建一个
[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[2] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .
[3] R. Aumann. Markets with a continuum of traders , 1964 .
[4] P. Schweitzer. Perturbation theory and finite Markov chains , 1968 .
[5] S. M. Robinson. Bounds for error in the solution set of a perturbed linear program , 1973 .
[6] David M. Kreps,et al. Learning Mixed Equilibria , 1993 .
[7] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[8] J. Renegar. Some perturbation theory for linear programming , 1994, Math. Program..
[9] Andrew G. Barto,et al. An Actor/Critic Algorithm that is Equivalent to Q-Learning , 1994, NIPS.
[10] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[11] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[12] J. Filar,et al. Competitive Markov Decision Processes , 1996 .
[13] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .
[14] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .
[15] S. Hart,et al. A General Class of Adaptive Strategies , 1999 .
[16] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[17] Ward Whitt,et al. A Nonstationary Offered-Load Model for Packet Networks , 2001, Telecommun. Syst..
[18] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[19] Neri Merhav,et al. On sequential strategies for loss functions with memory , 2002, IEEE Trans. Inf. Theory.
[20] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[21] Ehud Lehrer,et al. A wide range no-regret theorem , 2003, Games Econ. Behav..
[22] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[23] Shie Mannor,et al. The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..
[24] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[25] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine Learning.
[26] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[27] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).
[28] S. Bobkov,et al. Modified Logarithmic Sobolev Inequalities in Discrete Settings , 2006 .
[29] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[30] Shie Mannor,et al. Regret minimization in repeated matrix games with variable stage duration , 2008, Games Econ. Behav..