A Counterexample to Temporal Differences Learning
暂无分享,去创建一个
[1] R. Bellman. Dynamic programming. , 1957, Science.
[2] Åke Björck,et al. Numerical Methods , 1995, Handbook of Marine Craft Hydrodynamics and Motion Control.
[3] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[4] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[5] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[6] Zhi-Quan Luo,et al. On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.
[7] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[8] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[9] Luo Zhi-quan,et al. Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .
[10] A. Harry Klopf,et al. Advantage Updating Applied to a Differrential Game , 1994, NIPS.
[11] O. Mangasarian,et al. Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .
[12] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[13] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[14] O. Nelles,et al. An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.
[15] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.