On the Computational Economics of Reinforcement Learning
暂无分享,去创建一个
[1] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
[2] Y. M. El-Fattah,et al. Recursive Algorithms for Adaptive Control of Finite Markov Chains , 1981 .
[3] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[4] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[5] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .
[6] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[7] Mitsuo Sato,et al. Learning control of finite Markov chains with unknown transition probabilities , 1982 .
[8] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
[9] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[10] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.
[11] J. S. Riordon. An adaptive automaton controller for discrete-time Markov processes , 1969 .
[12] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[13] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
[14] Graham C. Goodwin,et al. Adaptive filtering prediction and control , 1984 .
[15] Y. Chien,et al. Pattern classification and scene analysis , 1974 .
[16] Mitsuo Sato,et al. An asymptotically optimal learning controller for finite Markov chains with unknown transition probabilities , 1985 .
[17] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[18] Richard S. Sutton,et al. Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.
[19] Richard Wheeler,et al. Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.
[20] Ian H. Witten. Exploring, Modelling, and Controlling Discrete Sequential Environments , 1977 .
[21] Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[22] P. Kumar,et al. Optimal adaptive controllers for unknown Markov chains , 1982 .
[23] P. Mandl,et al. Estimation and control in Markov chains , 1974, Advances in Applied Probability.
[24] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .
[25] V. Borkar,et al. Adaptive control of Markov chains, I: Finite parameter set , 1979, 1979 18th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.
[26] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[27] K. Narendra,et al. Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.
[28] MITSUO SATO,et al. Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..
[29] Chris Watkins,et al. Learning from delayed rewards , 1989 .