Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
暂无分享,去创建一个
[1] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[2] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[3] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[4] Ronald Ortner,et al. Pseudometrics for State Aggregation in Average Reward Markov Decision Processes , 2007, ALT.
[5] DE Economist. A SURVEY ON THE BANDIT PROBLEM WITH SWITCHING COSTS , 2004 .
[6] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[7] Robert E. Tarjan,et al. Faster parametric shortest path and minimum-balance algorithms , 1991, Networks.
[8] T. Lai,et al. Optimal Learning and Experimentation in Bandit Problems , 2000 .
[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[10] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.
[11] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[12] D. Teneketzis,et al. Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .
[13] Sandy Irani,et al. Efficient algorithms for optimum cycle mean and optimum cost to time ratio problems , 1999, DAC '99.
[14] Jeffrey J. Hunter,et al. Mixing times with applications to perturbed Markov chains , 2006 .
[15] Omid Madani,et al. Polynomial Value Iteration Algorithms for Detrerminstic MDPs , 2002, UAI.
[16] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[17] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[18] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[19] Ronald Ortner. Online Regret Bounds for Markov Decision Processes with Deterministic Transitions , 2008, ALT.
[20] Richard M. Karp,et al. A characterization of the minimum cycle mean in a digraph , 1978, Discret. Math..
[21] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[22] C. D. Meyer,et al. Markov chain sensitivity measured by mean first passage times , 2000 .
[23] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.
[24] Rajesh K. Gupta,et al. Faster maximum and minimum mean cycle algorithms for system-performance analysis , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[25] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.
[26] U. Rieder,et al. Markov Decision Processes , 2010 .
[27] James B. Orlin,et al. Finding minimum cost to time ratio cycles with small integral transit times , 1993, Networks.
[28] Ambuj Tewari,et al. Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs , 2007, NIPS.
[29] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.