Learning and Exploitation Do Not Conflict Under Minimax Optimality
暂无分享,去创建一个
[1] C. Stein. A Two-Sample Test for a Linear Hypothesis Whose Power is Independent of the Variance , 1945 .
[2] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[3] Onésimo Hernández-Lerma,et al. Controlled Markov Processes , 1965 .
[4] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[5] Sebastian Thrun,et al. Learning to Play the Game of Chess , 1994, NIPS.
[6] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[7] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[8] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[9] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[10] Csaba Szepesv. Certainty Equivalence Policies Are Self-optimizing under Minimax Optimality Certainty Equivalent Policies Are Self-optimizing under Minimax Optimality , 1996 .
[11] epetivari Ctiaba Sz. Some basic facts concerning minimax sequential decision processes , 1996 .
[12] Csaba Szepesv Ari,et al. Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .