Reinforcement Learning and Markov Decision Processes
暂无分享,去创建一个
[1] Leslie Pack Kaelbling,et al. Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..
[2] Marco Colombetti,et al. Robot Shaping: An Experiment in Behavior Engineering , 1997 .
[3] Andrew G. Barto,et al. Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Bohdana Ratitch. On characteristics of markov decision processes and reinforcement learning in large domains , 2005 .
[6] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[7] Sven Koenig,et al. The interaction of representations and planning objectives for decision-theoretic planning tasks , 2002, J. Exp. Theor. Artif. Intell..
[8] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[9] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[10] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[11] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[12] Gary L. Drescher,et al. Made-up minds - a constructivist approach to artificial intelligence , 1991 .
[13] Stuart J. Russell,et al. Control Strategies for a Stochastic Planner , 1994, AAAI.
[14] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[15] De,et al. Relational Reinforcement Learning , 2001, Encyclopedia of Machine Learning and Data Mining.
[16] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[17] Richard S. Sutton,et al. Reinforcement learning architectures for animats , 1991 .
[18] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[19] Michael Wooldridge,et al. Artificial Intelligence Today , 1999, Lecture Notes in Computer Science.
[20] Marco Wiering. Model-based reinforcement learning in dynamic environments , 2002 .
[21] Marcus A. Maloof,et al. Incremental rule learning with partial instance memory for changing concepts , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..
[22] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[23] K. A. F. Ramling. Bi-Memory Model for Guiding Exploration by Pre-existing Knowledge , 2005 .
[24] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[25] Axel van Lamsweerde,et al. Learning machine learning , 1991 .
[26] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[27] K. R. Dixon,et al. Incorporating Prior Knowledge and Previously Learned Information into Reinforcement Learning Agents , 2000 .
[28] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[29] Geoffrey J. Gordon,et al. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.
[30] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .
[31] Matthijs T. J. Spaan,et al. Approximate planning under uncertainty in partially observable environments , 2002 .
[32] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[33] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[34] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[35] Blai Bonet,et al. Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback , 2003, IJCAI.
[36] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[37] Jonathan Schaeffer,et al. Kasparov versus Deep Blue: The Rematch , 1997, J. Int. Comput. Games Assoc..
[38] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[39] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[40] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[41] Marco Wiering. QV(λ)-learning: A New On-policy Reinforcement Learning Algorithm , 2005 .
[42] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[43] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[44] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[45] Bernard Widrow,et al. THE TRUCK BACKER-UPPER , 1990 .
[46] Marco Wiering. QV(lambda)-learning: A New On-policy Reinforcement Learning Algrithm , 2005 .
[47] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .
[48] Geoffrey J. Gordon,et al. Point-based approximations for fast POMDP solving , 2006 .
[49] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.
[50] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[51] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[52] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[53] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[54] W. Matthews. Mazes and Labyrinths: A General Account of Their History and Developments , 2015, Nature.
[55] Eric R. Zieyel. Operations research : applications and algorithms , 1988 .
[56] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[57] Shlomo Zilberstein,et al. LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..
[58] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[59] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[60] Craig Boutilier,et al. Knowledge Representation for Stochastic Decision Process , 1999, Artificial Intelligence Today.
[61] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[62] Martijn van Otterlo,et al. A survey of reinforcement learning in relational domains , 2005 .
[63] Adaptive State-Space Quantisation and Multi-Task Reinforcement Learning Using . . . , 2000 .
[64] R. Bellman. Dynamic programming. , 1957, Science.
[65] Jürgen Schmidhuber,et al. Fast Online Q(λ) , 1998, Machine Learning.
[66] Anthony Stentz,et al. Focused Dynamic Programming: Extensive Comparative Results , 2004 .
[67] Martijn van Otterlo,et al. The Logic of Adaptive Behavior - Knowledge Representation and Algorithms for Adaptive Sequential Decision Making under Uncertainty in First-Order and Relational Domains , 2009, Frontiers in Artificial Intelligence and Applications.
[68] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.
[69] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[70] Nicholas Kushmerick,et al. An Algorithm for Probabilistic Planning , 1995, Artif. Intell..