Markov Decision Processes: Concepts and Algorithms
暂无分享,去创建一个
[1] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[2] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[3] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[4] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[5] K. A. F. Ramling. Bi-Memory Model for Guiding Exploration by Pre-existing Knowledge , 2005 .
[6] Martijn van Otterlo,et al. The logic of adaptive behavior : knowledge representation and algorithms for the Markov decision process framework in first-order domains , 2008 .
[7] Bohdana Ratitch. On characteristics of markov decision processes and reinforcement learning in large domains , 2005 .
[8] Marcus A. Maloof,et al. Incremental rule learning with partial instance memory for changing concepts , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..
[9] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[12] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[13] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[14] W. Matthews. Mazes and Labyrinths: A General Account of Their History and Developments , 2015, Nature.
[15] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[16] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[17] Adaptive State-Space Quantisation and Multi-Task Reinforcement Learning Using . . . , 2000 .
[18] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[19] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.
[20] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[21] Nicholas Kushmerick,et al. An Algorithm for Probabilistic Planning , 1995, Artif. Intell..
[22] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[23] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[24] Sven Koenig,et al. The interaction of representations and planning objectives for decision-theoretic planning tasks , 2002, J. Exp. Theor. Artif. Intell..
[25] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[26] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .
[27] Craig Boutilier,et al. Knowledge Representation for Stochastic Decision Process , 1999, Artificial Intelligence Today.
[28] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[29] Jonathan Schaeffer,et al. Kasparov versus Deep Blue: The Rematch , 1997, J. Int. Comput. Games Assoc..
[30] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[31] Wayne L. Winston. Operations research: applications and algorithms / Wayne L. Winston , 2004 .
[32] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[33] Stuart I. Reynolds. Reinforcement Learning with Exploration , 2002 .
[34] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[35] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[36] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.