Learning curve bounds for a Markov decision process with undiscounted rewards
暂无分享,去创建一个
[1] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[2] Lawrence K. Saul,et al. Markov decision processes in large state spaces , 1995, COLT '95.
[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[4] Sompolinsky,et al. Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.
[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[6] T. Watkin,et al. THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .
[7] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[8] E. M.,et al. Statistical Mechanics , 2021, On Complementarity.
[9] Stanley J. Rosenschein,et al. Learning to act using real-time dynamic programming , 1996 .
[10] David Haussler,et al. Rigorous Learning Curve Bounds from Statistical Mechanics , 1994, COLT.
[11] M. Marcus,et al. A Survey of Matrix Theory and Matrix Inequalities , 1965 .
[12] D. Haussler,et al. Rigorous Learning Curve Bounds from Statistical Mechanics , 1994, COLT '94.
[13] C. Fiechter. Eecient Reinforcement Learning , 1994 .
[14] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[15] G. Parmigiani. Large Deviation Techniques in Decision, Simulation and Estimation , 1992 .
[16] James A. Bucklew,et al. Large Deviation Techniques in Decision, Simulation, and Estimation , 1990 .