On Discontinuous Q-Functions in Reinforcment Learning
暂无分享,去创建一个
[1] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .
[2] Marvin Minsky,et al. Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.
[3] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .
[4] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[5] P. J. Werbos,et al. Backpropagation and neurocontrol: a review and prospectus , 1989, International 1989 Joint Conference on Neural Networks.
[6] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[7] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[8] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[9] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[10] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[11] F. J. Śmieja,et al. Multiple Network Systems (Minos) Modules: Task Division and Module Discrimination , 1991 .
[12] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .
[13] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[14] A G Barto,et al. Simulation Experiments with Goal-Seeking Adaptive Elements. , 1984 .
[15] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[16] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[17] Geoffrey E. Hinton. Connectionist Learning Procedures , 1989, Artif. Intell..
[18] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.
[19] Paul J. Werbos,et al. An Empirical Test of New Forecasting Methods Derived from a Theory of Intelligence: The Prediction of Conflict in Latin America , 1978, IEEE Transactions on Systems, Man, and Cybernetics.
[20] P. Dayan. The Convergence of TD(λ) for General λ , 2004, Machine Learning.
[21] Dieter Fox,et al. Learning By Error-Driven Decomposition , 1991 .
[22] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[23] A. L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[24] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[25] Richard E. Korf,et al. Real-time heuristic search: new results , 1988, AAAI 1988.
[26] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.